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Abstract 


In  machining  operations,  the  precision  of  the  workpiece  dimensions  depends  on  the  accuracy  of  the 
relative  position  of  the  cutting  tool  and  the  workpiece.  Among  the  key  factors  that  affect  the 
accuracy  of  this  relative  position  are  the  geometric  errors  of  the  machine  tool  and  the  thermal 
effects  on  these  geometric  errors.  Recent  work  on  developing  models  to  predict  volumetric  errors 
on  NC  lathes  has  led  to  a synthesis  technique  the  combines  the  modeling  of  individual  axis  related 
geometric-thermal  components  by  way  of  a rigid  body  kinematic  model  to  produce  predicted  errors 
in  the  work  volume  of  the  machine.  Homogeneous  coordinate  transformations  are  the  tools  used 
to  synthesize  the  individual  error  component  models  into  the  unified  volumetric  error  model.  The 
individual  error  components  are  often  modeled  as  polynomial  functions  of  a component’s  slide 
position  and  the  temperature  state  of  the  machine,  which  is  typically  measured  by  thermal  sensors 
located  on  the  machine  tool.  An  alternative  method  of  modeling  these  component  errors  is 
described  in  this  report.  Neural  network  computing  is  shown  to  be  a viable  technique  for 
developing  mappings  between  machine  tool  component  error  measurements  and  the  vector 
consisting  of  both  a component  slide  position  and  the  temperature  state  of  the  machine  as  reported 
by  the  thermal  sensors.  The  conjugate  gradient  algorithm,  used  to  compute  the  optimum  neural 
network  weights  for  the  machine  tool  error  components,  is  described.  A case  study  of  the  mapping 
results  for  one  component  error  of  an  actual  NC  lathe  is  given.  Finally,  the  source  codes  for  the 
neural  network  algorithm  and  the  conjugate  gradient  algorithm  are  given  in  FORTRAN. 

Keywords:  conjugate  gradient  algorithm;  gradient  descent  algorithm;  machine  tool  error 

compensation;  machine  tool  errors;  neural  networks;  nonlinear  optimization 
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1.0  Introduction 


In  machining  operations,  the  precision  of  the  workpiece  dimensions  depends  on  the  accuracy  of  the 
relative  position  of  the  cutting  tool  and  the  workpiece.  Among  the  key  factors  that  affect  the 
accuracy  of  this  relative  position  are  the  geometric  errors  of  the  machine  tool  and  the  thermal 
effects  on  these  geometric  errors.  Geometric  errors  are  caused  by  the  unwanted  motions  of 
machine  elements  such  as  carriages,  cross-slides,  and  work-tables.  These  motions  occur  because 
of  geometric  imperfections  and  misalignments  of  the  machine  tool  design.  Heat  generated  by  the 
machine  tool  and  the  cutting  operation  causes  temperature  changes  of  the  machine  tool  elements 
and  environment.  Due  to  the  complex  geometry  of  the  machine  structure,  concentrated  heat 
sources  such  as  the  drive  motors  and  spindle  bearings,  create  thermal  gradients  along  the  machine 
structure.  Spindle  growth,  lead  screw  expansion,  and  a significant  part  of  the  machine  structure 
deformation  are  the  results  of  these  temperature  changes  and  gradients;  therefore,  reducing  the 
geometric  and  thermally-induced  errors  of  a machine  tool  is  a key  requirement  for  improving  the 
workpiece  accuracy.  In  addition  to  the  improved  workpiece  accuracy,  productivity  is  increased  by 
the  elimination  of  the  nonproductive  warm-up  cycle  of  1-2  hours  commonly  used  to  reach  thermal 
equilibrium  when  machining  precision  parts. 

Recent  work  by  Donmez  and  his  colleagues  [1,2,3],  on  developing  models  to  predict  volumetric 
errors  of  NC  lathes,  has  led  to  a synthesis  technique  that  combines  the  modeling  of  individual  axis 
related  geometric-thermal  error  components  by  way  of  a rigid  body  kinematic  model  to  produce 
predicted  errors  in  the  work  volume  of  the  machine.  Wu  [4]  describes  several  other  error 
component  modeling  techniques.  For  example,  characterizing  a 3-axis  machine  tool  requires  21 
geometric  error  components  plus  spindle  errors.  Some  of  these  error  components  are  purely 
temperature  related  and  others  are  functions  of  both  temperature  and  nominal  position.  For  a 
formal  definition  of  the  various  machine  tool  error  components  see  Hocken  [20]. 

Homogeneous  coordinate  transformations  under  rigid  body  kinematic  assumptions  have  been  used 
by  Donmez  [1,5  ] to  synthesize  individual  error  component  models  into  a unified  volumetric  error 
model.  The  rigid  body  kinematic  sissumption  implies  that  the  machine  component  errors  at  each 
slide  depend  on  their  own  coordinate  system  and  are  not  affected  by  the  movements  of  other  slides. 
That  is,  measurements  on  each  slide  are  independent  with  regard  to  position  on  other  slides.  The 
end  result  of  the  synthesis  produces  volumetric  components  as  functions  of  several  individual  error 
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components  and  nominal  tool  positions. 


Individual  error  components  are  often  modeled  as  polynomial  ftmctions  of  the  slide  position  and 
temperature  state  of  the  machine,  which  is  typically  measured  by  thermal  sensors  located  on  the 
machine  tool.  For  a survey  of  these  measurement  techniques  see  Wu  [4].  For  a specific  application 
to  an  NC  lathe  see  Donmez  [5]. 

In  this  report  we  consider  an  alternative  method  of  modeling  the  individual  error  components. 
Instead  of  polynomial  functions  of  axis  position  and  selected  temperature  measurements  we 
consider  the  use  of  neural  networks  as  a technique  of  developing  mappings  between  patterns  of 
recorded  position  and  machine  thermal  states  and  measured  component  errors.  An  object  of  this 
modeling  is  to  determine,  if  possible,  limits  on  the  number  of  data  samples  required  and  how  many 
and  where  thermal  sensors  should  be  placed.  This  report,  however,  concentrates  on  the 
documentation  of  a particular  neural  net  program  and  its  application  to  sample  data  sets. 

Neural  network  computing  is  a computational  method  that  uses  networks  of  simple  processing 
elements  or  neural  nodes  to  implement  a desired  mapping  relationship  between  input  and  output 
data  patterns.  A neural  node  is  a computational  device  that  takes  a number  of  inputs  from  either 
external  sources  or  other  nodes,  performs  sums  of  weighted  inner  products  of  these  inputs,  applies 
a fixed  nonlinear  transformation  to  this  inner  product  and  passes  the  result  on  to  the  next  layer  of 
neural  nodes  or  to  external  outputs.  Although  neural  networks  have  their  early  history  connected 
with  biological  questions  of  how  the  human  brain  works,  certain  forms  of  neural  networks,  called 
multilayer  feedforward  networks,  can  be  shown  to  produce  functions  that  will  uniformly 
approximate  continuous  functions  of  many  variables  in  the  sense  of  mathematical  approximation 
theory  (Cybenko  [13]).  For  a discussion  of  the  biological  background  of  neural  networks  see  Ritter, 
Martinetz  and  Schulten  [6].  For  a discussion  of  the  general  mathematical  problems  of  neural 
computing  see  Cybenko  [7]. 

The  neural  networks  computer  program  used  in  this  study  is  a modified  version  of  one  develped 
by  Dr.  James  L.  Blue  at  NIST  for  a study  of  optical  character  recognition  (see  Grother  and  Blue 
[8]).  The  optimization  portion  of  the  program  is  based  on  the  scaled  conjugate  gradient  algorithm 
of  M0ller  [9].  Stefan  Leigh  of  the  Statistical  Engineering  Division  at  NIST  developed  the 
DATAPLOT  output  described  in  section  9.  The  authors  also  wish  to  acknowledge  John  Meyer, 
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Phil  Nanzetta  and  Don  Blomquist  for  their  continued  support. 


2.0  Machine  Tool  Error  Compensation 


Error  compensation  is  defined  as  a method  of  cancelling  the  effects  of  systematic  errors  either  by 
directly  or  indirectly  measuring  these  errors,  or  by  predicting  them  using  a model  previously 
established  for  the  process.  For  a discussion  of  the  techniques  involved  in  machine  tool  error 
compensation  see  Donmez  [5,  21],  Zhang  [10].  Zhang  describes  geometric  error  corrections  for 
coordinate  measuring  machines  whereas  Donmez  details  the  application  of  geometric-thermal  error 
correction  to  an  NC  lathe.  The  errors  for  which  the  system  compensates  are  quasistatic  geometric 
and  thermally-induced  errors.  These  errors  are  predicted  by  using  relationships  established  between 
systematic  errors  and  particular  machine  tool  temperature  profiles.  In  the  process  of  building  these 
relationships,  the  geometric  errors  are  measured  using,  for  example,  laser  interferometry  and  high 
precision  capacitance  probes.  The  temperature  profile  is  determined  by  monitoring  thermocouples 
placed  in  critical  locations.  Some  of  these  locations  are  the  leadscrew  nuts  and  bearing  housings, 
the  spindle  bearing  housings,  the  headstock,  the  bed,  several  points  on  the  cross  slide  and  the 
carriage  bodies. 

At  present  the  relationships  for  each  type  of  error,  such  as  linear  displacement,  straightness,  yaw, 
and  orthogonality,  are  established  by  applying  least-squares  curve  fitting  techniques  to  the  error 
data  and  the  corresponding  temperature  profiles  for  each  element  of  the  machine.  After  predicting 
each  error  component,  the  compensation  system  uses  the  principles  of  rigid  body  kinematics  to 
combine  the  error  components  in  order  to  find  the  error  at  the  cutting  tool. 

The  actual  error  compensation  is  achieved  by  sending  the  error  values  to  the  machine  controller. 
The  machine  tool  controller  acts  on  the  compensation  values  in  software,  thus,  no  modification  to 
the  machine  control  hardware  is  necessary. 

3.0  Kinematic  Model 

The  rigid  body  kinematic  model  referred  to  in  section  2.0  is  a general  mathematical  model  that 
calculates  the  vector  error  at  the  cutting  tool  from  a large  number  of  reproducible  error 
components.  These  components  correspond  to  errors  contributed  by  machine  structure  elements. 
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The  individual  errors  are  decomposed  into  their  geometric  and  thermally-induced  error 
components. 

3.1  General  Model  Structure 

The  mathematical  model  used  to  predict  volumetric  errors,  is  generated  by  using  homogeneous 
transformation  matrix  manipulations,  with  the  assumptions  of  rigid  body  kinematics.  These 
transformations  describe  the  spatial  relationships  between  the  machine  tool  structural  elements. 
A homogeneous  coordinate  transformation  matrix  in  three-dimensional  space  is  a 4 X 4 matrix. 
It  is  used  to  express  a homogeneous  coordinate  vector  in  one  coordinate  system  with  respect  to 
another  coordinate  system.  Similarly,  a homogeneous  transformation  matrix  can  be  used  to 
represent  one  coordinate  system  with  respect  to  another  or  reference  coordinate  system.  If  the 
coordinate  frame  is  embedded  in  an  object,  then  a homogeneous  matrix  describes  the  relative 
position  and  orientation  of  this  object  with  respect  to  another  object  or  coordinate  frame  in  space. 
An  important  feature  of  homogeneous  coordinate  transformations  is  that  they  can  be  multiplied 
in  series  to  describe  the  position  and  orientation  of  one  object  with  respect  to  several  coordinate 
frames. 

The  homogeneous  transformation  used  to  describe  the  spatial  orientation  and  location  of  a machine 
structural  element  takes  the  form 


^Lt  ^2x  ^3x  Px 

^ly  ^2y  ^3y  Py 

^2z  ^3z  Pz 
0 0 0 1 


(1) 


where  the  vectors  Oj,  03  describe  the  axis  system  orientation  of  a coordinate  frame  with  respect 
to  another  coordinate  frame.  The  vector p is  the  position  vector  of  the  origin  of  the  first  coordinate 
frame  with  respect  to  the  second. 

Since  a machine  tool  can  be  considered  as  a chain  of  linkages,  an  approach  that  describes  the 
spatial  geometry  of  linkages  with  respect  to  a reference  frame  by  matrix  multiplications  can  be  used 
to  determine  the  spatial  relationship  between  the  cutting  tool  and  the  workpiece.  Thus,  by 
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multiplying  the  homogeneous  transformation  matrices  corresponding  to  a series  of  elements  such 
as  the  carriage,  cross-slide,  cutting  tool,  workpiece  and  spindle,  the  cutting  tool  and  the  workpiece 
position  can  be  described  with  respect  to  a conveniently  chosen  fixed  reference  frame  (Donmez  [1]). 
In  ideal  conditions,  the  cutting  tool  follows  the  contours  of  the  ideal  workpiece  geometry.  Thus, 
at  any  time  during  the  operation,  the  resultant  homogeneous  transformation  matrices, 

'^TOOL 


for  the  cutting  tool  and 


WORK 


(3) 


for  the  point  on  the  workpiece,  should  be  identical.  However,  due  to  the  individual  errors  involved, 
these  two  matrices  are  not  identical  in  reality.  The  total  error,  E,  is  represented  by  the  following 
equation: 


T =T  E 

^WORK  ^TOOL^ 


(4) 


The  equation 

F=T~^  T 

^ TOOL^  WORK 

gives  the  resultant  error  matrix. 


(5) 


As  discussed  earlier  homogeneous  transformations  can  be  multiplied  in  order  to  locate  a structural 
frame  with  respect  to  a reference  frame  such  as,  for  example,  the  machine  coordinate  system.  On 
a lathe,  for  example,  the  final  tool  transformation  may  be  a combination  of  transformations  starting 
with  the  horizontal  slide  (z),  then  the  vertical  slide  (x),  followed  by  the  turret  and  finally  the  tool. 
Thus  one  has  the  string  of  multiplications 

f = T TURRET'p 

^TOOL  -*2  -^TURRET  ^ TOOL  ^ ’ 

where  the  leading  superscripts  represent  the  most  immediate  reference  frame.  Thus  the  second 
transformation  describes  the  orientation  of  the  x-slide  system  with  respect  to  the  z-slide  system. 
The  z-slide  is  ultimately  referenced  to  the  machine  coordinate  system.  A similar  set  of  component 
matrices  combine  to  form  the  workpiece  transformation.  This  might,  for  example,  be  a combination 
of  the  spindle  coordinate  system  transformation  followed  by  a workpiece  transformation  with 
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respect  to  the  spindle. 


Many  of  the  components  of  these  matrices  are  small  so  that  the  matrix  multiplications  make  use 
of  the  algebra  of  infinitesimals.  That  is,  higher  order  terms  are  ignored.  However,  each  of  the 
matrices  are  composed  of  elements  that  are  functions  of  measured  quantities.  The  individual 
quantities  will  be  described  below. 

3.2  Predictive  Machine  Calibration 


The  error  vector  at  the  tool  tip  consists  of  individual  error  components  corresponding  to  different 
structural  elements  of  the  machine  tool.  This  is  the  result  of  the  product  of  the  homogeneous 
transformations.  In  order  to  predict  the  resultant  volumetric  error  at  any  location  and  at  any  time 
in  the  machine  work  zone,  all  of  these  individual  error  components  must  be  predicted.  Although 
these  components  are  geometric  error  components  of  the  structural  elements  of  the  machine  tool, 
their  characteristics  change  as  a result  of  such  factors  as  thermal  effects  and  loading  conditions. 
Currently,  only  the  effects  of  nominal  position  and  the  thermal  state  of  the  machine  on  the 
individual  error  components  are  considered.  Factors  such  as  tool  post  and  workpiece  deflection 
are  not  considered  although  the  mathematical  model  can  account  for  such  items.  Thus,  any  error 
(e)  can  be  considered  as  some  combination  of  nominal  position  (x)  and  temperature.  This  may  be 
expressed  as: 


e=a^+a^+a^  +-+bj'+bj'  +• 


(7) 


The  coefficients  in  this  equation  can  be  determined  by  using  least-squares  curve-fitting  techniques. 
The  data  used  in  this  analysis  is  obtained  by  making  error  measurements  while  monitoring  axes 
positions  and  temperatures  on  the  machine  structure.  A description  of  this  methodology  and 
related  uncertainties  due  to  data  fitting  for  a particular  machine  tool  are  given  in  Donmez  [5].  The 
need  to  only  measure  individual  axis  related  error  components  is  at  the  basis  of  the  rigid  body 
kinematic  assumption. 


Since  a coordinate  frame  is  assigned  to  each  element  of  the  machine  tool,  and  these  elements  are 
designed  for  shear  rather  than  for  bending,  it  is  possible  to  determine  the  error  at  any  nominal 
position  in  a work  zone  by  measuring  the  errors  along  the  orthogonal  axes  of  the  work  zone.  This 
procedure  decreases  the  number  of  measurement  points  significantly  since  it  eliminates  the  need 
for  grid  measurements  in  the  whole  work  zone. 
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3.3  Individual  Error  Contributions 


The  error  components  are  classified  into  four  groups  with  similar  characteristics,  measurement 
procedures  and  measurement  instrumentation.  The  four  groups  are:  1)  linear  displacement  errors, 
2)  angular  errors,  3)  straightness-parallelism-orthogonality  errors  and  4)  spindle  thermal  drift.  A 
schematic  of  all  of  the  contributing  errors  is  shown  in  Figure  1.  This  shows  the  nine  axis  related 
errors  that  contribute  to  the  total  work  volume  error  of  an  NC  turning  center. 

3.3.1  Linear  Displacement  Errors 

Linear  displacement  errors  are  errors  of  position  of  machine  elements  along  their  axes  of  motion. 
This  type  of  error  is  the  direct  result  of  drive  system  errors,  such  as  erroneous  lead  of  the  ballscrew, 
ballscrew  misalignment,  and  coupling  errors  between  the  ballscrew  and  the  feedback  system.  In 
addition,  displacement  errors  include  displacements  due  to  thermal  effects  on  the  ballscrew.  Since 
thermal  effects  are  correlated  with  the  effective  length  of  the  ballscrew,  the  linear  displacement 
errors  are  not  easily  decoupled  and  include  position  and  thermally-induced  components.  For 
example,  it  has  been  found  (see  Gilsinn,  et.  al.  [11])  that  the  linear  displacement  error  (6)  for  a 
specific  NC  lathe  at  a nominal  position  (x)  as  a function  of  ballscrew  bearing  housing  temperature 
(T)  can  be  described  by  an  equation  of  the  form: 

6 = Aq  + fljjc  + + b^T  + c^xT.  (^) 


where  T here  represents  a particular  thermocouple. 
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Figure  1:  The  component  errors  that  contribute  to  the  combined  X and  Z errors. 


3.3.2  Angular  Errors 


Angular  errors  are  rotational  errors  caused  by  geometric  inaccuracies  of  the  slideways  and  the 
misalignment  in  the  assemblies  of  structural  elements  of  the  machine  tool.  The  three  rotational 
errors  around  the  three  orthogonal  axes  of  the  machine  slide  are  defined  as  yaw,  pitch  and  roll. 
Yaw  error  is  the  rotational  error  of  the  slide  around  the  axis  perpendicular  to  the  plane  in  which 
the  axis  of  motion  lies.  Pitch  error  is  the  rotational  error  of  the  slide  around  the  axis  which  is  in 
the  plane  of  motion  and  perpendicular  to  the  axis  of  motion  of  the  slide.  Roll  error  is  the 
rotational  error  of  the  slide  around  the  axis  of  motion.  The  contributions  of  all  three  types  of 
rotational  errors  to  the  resultant  error  are  significant  in  three  or  more  axis  machining  centers. 
However,  in  turning  centers,  the  contributions  of  roll  and  pitch  errors  to  the  resultant  positioning 
error  of  the  cutting  tool  are  in  the  nonsensitive  direction,  thus  creating  second  order  errors.  The 
nonsensitive  direction  in  turning  centers  is  the  direction  perpendicular  to  the  plane  in  which  the  two 
machine  slides  are  moving.  Therefore,  yaw  error  measurements  are  sufficient  for  resultant  error 
calculations  for  turning  centers  of  the  type  considered  here.  The  yaw  errors  for  an  NC  lathe  have 
been  found  to  be  of  the  form: 


ey=aQ+a^x+a^^ +b^T  (9) 

where  the  subscript  y represents  the  yaw  error. 

3.3.3  Straightness-Parallelism-Orthogonalitv 

Slide  straightness  error  is  the  nonlinear  translational  movement  of  the  slide  in  the  two  orthogonal 
directions  other  than  its  axis  of  motion.  Although  straightness  errors  are  translation  errors  rather 
than  rotational  their  functional  form  for  an  NC  lathe  have  taken  a similar  one  to  the  form  of 
angular  errors  described  in  section  3.3.2.  Parallelism  and  orthogonality  describe  the  angular 
orientation  of  the  machine  axes  with  respect  to  each  other.  They  have  been  found  to  be 
polynomials  with  temperature  dependence  alone  (see  Donmez  [5]). 

3.3.4  Spindle  Thermal  Drift 

Spindle  thermal  drift  is  the  axial  and  radial  translations  along  the  axes  of  the  machine  tool  along 
with  tilt  motions  in  two  orthogonal  planes.  These  are  thermally  dependent  functions  only. 
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4.0  Some  Regression  Approach  Shortcomings 


For  the  turning  center  error  functions  described  above  two  data  sets  were  developed  for  each  error 
term  (Gilsiim,  et.al.  [11])  These  two  data  sets  represented  both  forward  and  backward  motions 
along  the  axis  of  motion.  The  equation  fitting  process  was  one  of  trial  and  error.  Engineering 
judgement  helped  determine  the  appropriate  variables  entering  each  regression  equation.  This 
judgement  is  based  upon  the  knowledge  of  cause  and  effect  relationships  between  temperature  rise 
and  expansion  of  metal.  The  procedure  did  not  lend  itself  to  be  automated,  however. 

Without  engineering  judgement,  the  regression  process  can  become  very  unwieldy.  In  particular 
for  the  turning  center  studied  40  thermocouples  were  used.  Approximately  600  data  samples  were 
taken  along  each  axis  of  motion.  A data  sample  included  a nominal  position  along  an  axis  of 
motion  and  readings  from  the  40  thermocouples  along  with  the  measured  error.  Thus  the  initial 
table  for  regression  would  be  approximately  600  rows  by  42  columns. 

If  a straightforward  regression  is  required,  to  find  equations  up  to  quadratic  order  would  require 
adding  42  x 42  more  columns  to  the  data  matrix  to  account  for  all  possible  product  terms.  The  end 
result  would  be  a matrix  of  approximately  600  rows  by  1800  columns.  Standard  regression  packages 
would  require  the  inversion  of  an  1800  x 1800  matrix  to  determine  the  regression  coefficients.  This 
is  not  a reasonable  approach. 

A stepwise  regression  procedure  could  begin  by  examining  the  correlation  matrix  between  all  of  the 
42  variables  and  select  the  first  variable  to  enter  the  regression  by  the  highest  correlation 
coefficient.  Next,  regression  would  be  plotted  against  each  of  the  other  variables  to  determine 
whether  any  of  the  other  variables  would  likely  enter  the  regression.  If  this  process  indicated 
another  variable  could  be  added  to  the  regression  equation  then  another  regression  fit  would  be 
performed  with  two  variables.  This  process  would  continue  until  all  variables  are  used  up.  Next 
columns  for  all  of  the  products  would  be  added  to  the  process.  Again  the  data  matrix  would  expand 
to  approximately  1800  columns. 

It  is  clear  that  standard  regression  procedures  are  cumbersome  when  there  are  a large  number  of 
variables  involved.  Physical  intuition  helps  reduce  the  effort  but  cannot  be  easily  introduced  into 
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an  automated  means  of  generating  the  error  component  function  maps.  For  this  reason  neural 
networks  are  a possible  alternative  for  generating  component  error  maps. 


5.0  Neural  Networks 

Neural  Networks  are  linkages  or  networks  of  artificial  processing  units  (or  neurons)  that  model  the 
behavior  of  biological  neural  systems.  Early  studies  of  biological  nervous  systems  guided  the 
structuring  of  neural  networks.  They  can  also  be  thought  of  as  a weighted  linking  of  distributed 
adaptable  processing  units.  These  networks  are  often  "taught”  by  presenting  examples  of  inputs  and 
outputs  to  them  and  adjusting  the  weights  and  adapting  the  processing  units  so  the  neural  network 
reproduce  as  nearly  as  possible  the  known  outputs  for  a given  set  of  inputs.  The  techniques, 
however,  do  not  differ  from  those  used  in  nonlinear  regression. 

Neural  networks  training  can  be  thought  of  as  solving  a problem  in  classical  approximation  theory 
(see  Girosi  and  Poggio  [12]).  This  theory  deals  with  the  approximating  or  interpolating  a 
continuous  function  f(x)  by  an  approximating  function  F(w,x),  where  w is  some  vector  of  unknown 
parameters.  The  approximation  problem  is  to  find  the  vector  w of  parameters  that  makes  F(w,x) 
match  f(x)  as  closely  as  possible  with  respect  to  some  predefined  measure  of  closeness.  Recent 
research  (see  G.  Cybenko  [13])  has  linked  the  class  of  neural  networks  called  multilayered, 
feedforward  neural  networks  very  closely  with  the  classic  approximation  problem  in  the  sense  that 
any  continuous  function  of  several  variables  over  finite  intervals  can  be  approximated  by  a linear 
combination  of  the  functions  defining  the  processing  units.  This  will  be  discussed  further  below. 

5.1  Parallel  Distributed  Processing  Model 

Neural  networks  can  also  be  throught  of  from  the  point  of  view  of  parallel  processing.  According 
to  Rumelhart,  et  al.  [14]  eight  elements  are  relevant  here: 

1.  Processing  Units 

2.  A state  of  activation  for  each  unit. 

3.  An  output  function  for  each  unit. 

4.  A connectivity  pattern  between  units 
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5.  A rule  for  propagating  patterns  through  the  network. 

6.  An  activation  rule  that  combines  inputs  to  a unit  to  produce  an  activation  level  for  that  unit. 

7.  A learning  rule  that  modifies  weights  based  on  sample  input  and  output  data. 

8.  An  environment  in  which  the  neural  network  operates. 

5.1.1  Processing  Units 

Processing  units  perform  the  simple  job  of  receiving  input  from  their  neighbors  in  the  connectivity 
structure,  compute  an  output  and  send  it  on  to  its  neighbors.  The  system  can  be  considered  parallel 
in  that  each  unit  can  carry  out  a computation  at  the  same  time. 

Three  types  of  units  make  up  a network:  input,  hidden  and  output.  Input  units  receive  data  from 
outside  sources  and  output  units  send  data  to  outside  destinations.  Hidden  units  are  those  whose 
inputs  and  outputs  lie  within  the  network  itself. 

5.1.2  State  of  Activation 


As  understood  here,  activation  values  are  continuous  values  between  0 and  1.  When  a processing 
unit  receives  a weighted  sum  of  inputs  from  connected  neighboring  cells  it  produces  an  output  or 
activation  level.  This  is  a number  between  the  minimum  of  0 and  the  maximum  of  1.  Mathemati- 
cally this  can  be  expressed  as: 


/ n 


yi=fi 


V=l 


W,jX. 


0. 


(10) 


where  represents  the  output  activation  signal  from  the  i-th  processing  unit,  represents  the 
weight  of  the  j to  i interconnection,  represents  the  threshold  value  of  the  i-th  processing  unit. 
The  type  of  transformation  given  by 

t - e,- 

y=i 


is  called  an  affine  transformation. 
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The  input  to  a processing  unit  can  be  either  the  output  from  connected  neighboring  units  or  directly 
from  outside  the  network.  The  output  from  the  i-th  processing  unit  can  be  used  either  as  input  to 
neighboring  units  or  as  output  from  the  network.  The  value  of  the  weight  determines  how 
strongly  the  output  of  the  j-th  processing  unit  influences  the  activity  of  the  i-th  processing  unit. 

5.1.3  Output  Function 

The  sigmoid  (or  S-shaped)  output  function  is  used  in  practice  to  determine  the  level  of  activation 
of  a processing  unit.  For  the  i-th  processing  unit  it  is  in  general  specified  as; 


where 


= 


1 

1 + e ' 


(12) 


a. 


-E 

y=i 


- e. 


(13) 


The  threshold  6-  acts  as  a filter  for  incoming  signals  and  a-  is  referred  to  as  the  activation  of  the 
i-th  processing  unit.  It  is  transformed  by  the  sigmoid  function  to  determine  the  magnitude  of  the 
current  output  signal  from  the  i-th  unit,  c is  a constant  which  determines  how  steeply  the  i-th 
processing  unit  ascends  from  a minimum  to  a maximum  value.  It  is  taken  as  1 in  this  report.  The 
smaller  it  is  the  slower  the  ascent,  the  larger  it  is  the  more  rapid  the  ascent.  In  particular  for  large 
values  the  sigmoid  function  begins  to  approximate  a step  function.  The  computational  unit 
described  in  sections  5.1.1  to  5.1.3  is  usually  called  a perceptron.  This  model  is  shown  in  Figure  2. 
This  figure  shows  the  weighted  input  to  a processing  unit  from  three  sources.  The  output  y is 
computed  as  a sigmoid  function  of  an  affine  transformation  of  the  inputs.  The  output  is  shown 
being  sent  to  three  other  units. 

5.1.4  Connectivity 

Processing  units  are  linked  to  each  other.  Units  connected  to  others  by  links  in  the  network  are 
sometimes  referred  to  as  neighbors.  Each  link  in  the  network  has  a weight  assigned  to  it  as 
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described  above.  The  total  pattern  of  links  can  be  represented  by  specifying  the  weights  for  each 
of  the  connections  in  the  network.  A positive  weight  represents  an  excitation  input  and  a negative 
weight  represents  an  inhibitory  input  to  a processing  unit.  The  absolute  value  of  the  weight 
specifies  the  strength  of  the  connection.  The  term  fan-in  is  sometimes  used  to  represent  the 
number  of  units  that  generate  inputs  to  a given  unit  and  the  term  fan-out  is  the  number  of  units 
affected  by  a given  units  output. 

Network  connectivity  architectures  can  be  divided  into  two  types:  recurrent  and  non-recurrent. 
Recurrent  networks  are  those  in  which  loops  exist.  That  is,  a units  output  can  by  way  of  various 
links  in  the  network  ultimately  return  to  affect  its  own  input.  A network  is  non-recurrent  if  a units 
output  does  not  affect  its  input.  An  important  subclass  of  non-recurrent  networks  are  those  in 
which  the  processing  units  are  organized  in  layers  and  only  one  way  links  are  allowed  between 
adjacent  layers.  These  networks  are  called  feedforward  networks.  Feedforward  networks  are  called 
fully  connected  if  each  unit  on  a layer  is  connected  to  every  unit  on  an  adjacent  layer.  Thus  if  there 
are  n units  on  one  layer  and  m units  on  an  adjacent  layer  then  there  are  n x m connetions. 

A feedforward  network  is  in  general  a multilayered  neural  network  in  which  perceptrons  are 
arranged  in  layers,  Where  the  output  of  one  layer  becomes  the  input  to  the  next  layer.  Ordinarily 
the  outputs  of  the  final  layer  are  weighted  and  summed  together  to  produce  the  output,  but  the 
final  layer  can  also  act  as  perceptrons.  This  is  the  type  of  network  used  in  the  current  study  and 
is  illustrated  in  Figure  3 in  which  there  are  three  input  nodes,  four  hidden  nodes  and  two  output 
nodes. 

5.1.5  Rule  of  Propagation 

A rule  of  propagation  is  one  that  specifies  how  output  values  from  nodes  are  to  be  combined  with 
the  connectivities  to  produce  inputs  to  neighboring  nodes.  Since  the  networks  used  in  the  current 
study  are  fully  connected  feedforward  networks  the  overall  propagation  rule  can  be  precisely 
specified. 

The  network  used  in  this  study  and  described  here  is  a three  layer  neural  net  with  one  input  layer, 
one  hidden  layer  and  one  output  layer.  Let  there  be  I input  nodes,  H hidden  nodes  and  J output 
nodes.  Each  input  node  is  connected  to  each  of  the  hidden  nodes  and  each  of  the  hidden  nodes 
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Figure  2.  Perceptron  model  of  a neural  network  processing  unit. 
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is  connected  to  eveiy  output  node.  This  means  that  there  are  I x H x J connections  between  input 
nodes  and  output  nodes.  Let  be  the  external  input  to  the  network  at  the  i-th  input  node  where 
i = 1,  ...,  I.  h&tyfj  be  the  output  from  h-th  hidden  node  andzy  be  the  output  from  the  j-th  output 
node.  Let  wlf^-  be  the  weight  on  the  link  that  connects  the  i-th  input  node  with  the  h-th  hidden 
node  and  let  w2j^  be  the  weight  on  the  link  that  connects  the  h-th  hidden  node  with  the  j-th  output 
node.  In  order  to  simplify  the  notation  slightly  the  threshold  values  in  the  nodal  activations  can  be 
considered  as  weights  in  the  sum  and  a dummy  input  node  and  hidden  node  can  be  added  with  a 
constant  input  value  of  1.  That  is, 

X,.,  = 1.0  (14) 


at  the  dummy  I+l  input  node  and  a constant  output  at  the  dummy  hidden  node  of 

yg.i  = 10 


(15) 


The  output  function  at  each  hidden  node  and  output  node  will  be  the  sigmoid  function  with 
c = 1.0. 


With  this  notation  the  propagation  rule  can  be  stated  as  follows:  For  the  j-th  hidden  node,  j = 1, 
...,  J compute 


yh=f 


//+i  \ 

\i=i 


(16) 


where  — l-O-  Next  compute  for  each  of  the  J output  units  the  following: 


V=i 


(17) 


where  2 = 1.0.  The  output  function  at  each  processing  node  is  given  by  the  sigmoid  function 


f(s)  = 


1 +e 


(18) 


Therefore  given  any  input  vector  x = ( Xp...,Xj  ) the  propagation  rule  above  produces  an  output 
vector  z = ( ) at  the  final  node  layer. 
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5.1.6  Activation  Rule 


This  is  a rule  by  which  the  inputs  to  a processing  nodes  are  combined  and  operated  on  by  the  nodal 
output  function  to  produce  a new  activation  level  at  the  node.  This  process  has  been  described 
mathematically  in  section  5.1.2. 

5.1.7  Connectivity  Modification 

Changing  the  connectivity  of  a parallel  distributed  processing  model  changes  its  "knowledge".  This 
can  be  done  by: 

1.  Developing  new  connectivities. 

2.  Losing  some  connectivities. 

3.  Modifying  the  strengths  of  existing  connectivities. 

Developing  new  connectivities  and  losing  connectivities  can  be  considered  special  cases  of  3 so  that 
this  study  will  consider  only  means  of  modifying  existing  strengths. 

By  far  the  most  popular  technique  for  modifying  the  connectivity  strengths  is  called  the  Back- 
Propagation  Algorithm.  Other  algorithms  have  recently  shown  better  performance  and  one  of  these 
will  be  discussed  below.  These  algorithms  are  essentially  optimization  algorithms. 

5.1.8  The  Usage  Environment 

Separate  implementations  of  a feedforward  neural  net  were  applied  to  develop  a selected  number 
of  component  errors  that  enter  into  the  kinematic  turning  center  error  correction  model.  Weights 
were  developed  to  predict  the  linear  displacement  error  of  one  axis  of  the  machine  using  existing 
data.  The  inputs  were  axis  locations  and  thermocouple  readings.  The  outputs  were  individual  error 
values  for  this  particular  error  component. 

If  all  of  the  error  maps  were  developed  by  the  method  of  neural  nets,  then  the  weights  for  each  of 
the  component  errors  entering  the  kinematic  model  would  be  looked  up  and  the  neural  net 
propagation  rule  evaluated  to  produce  error  values  for  each  component.  These  component  errors 
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would  be  cxjmbined  in  the  kinematic  model  to  produce  the  spatial  error  of  the  turning  center  at  the 
given  nominal  position  and  machine  temperature  state. 

5.2  Learning  Paradigm 

The  learning  approach  used  to  modify  the  connectivity  strengths  is  based  on  the  paradigm  of 
associative  learning.  In  this  paradigm  the  neural  net  adapts  its  weights  to  produce  a particular 
pattern  of  connectivities  and  activations  on  the  output  units  when  particular  input  data  patterns  are 
presented  to  the  input  units.  The  goal  is  to  find  a set  of  connectivity  weightings  and  threshold 
values  so  that  when  a particular  pattern  of  data  appears  at  the  input  then  the  associated  pattern 
of  output  data  appears  at  the  output  units.  The  connectivity  strengths  are  modified  iteratively  to 
try  to  teach  the  network  (i.e.  modify  the  connectivity  strengths  ) so  that  during  the  training  (or 
iterative  modification)  process  the  output  data  patterns  for  given  training  input  patterns  are  made 
to  more  closely  match  the  desired  or  target  output  data  patterns  for  the  given  input  patterns. 


6.0  Gradient  Descent  Algorithms 


Supervised  training  of  a neural  net  involves  the  reduction  of  some  error  value.  To  assess  the  overall 
training  (or  weight  adjustments)  a cumulative  error  over  all  data  patterns  in  the  training  set  must 
be  computed.  In  particular  let  there  be  P data  patterns  in  the  training  set  and  J the  number  of 
output  nodes.  Let  Zp  = {Zp2,...,Zpj)  be  the  p-th  output  data  pattern  for  the  input  vector  Xp  = 
(Xpp...,Xpj).  Furthermore,  let  tp  = (tpj,...,tpj)  be  the  desired  or  target  data  pattern  for  the  p-th  input 
Xp.  Then  the  root-mean-square  normalized  error  for  the  training  set  is  given  by 


1 


E E 


p=i  j=i 


(19) 


19 


For  the  purpose  of  developing  the  algorithms,  however,  let 


p j 


p^l  /=l 


(20) 


and  then 


-i-yii. 

<[Ri 


(21) 


In  the  next  several  sections  a detailed  discussion  of  the  theoretical  background  of  the  computer 
code  ^ven  in  Appendix  A will  be  ^ven.  Much  of  the  description  of  the  optimization  algorithm  is 
based  on  M0ller  [9].  The  notation  used  in  6.2  is  motivated  by  that  used  in  Rumelhart  [14].  The 
general  optimization  scheme  is  shown  in  Figure  4. 

6.1  Algorithm  Terminology 

Vector  notation  will  be  used  to  simplify  equations  as  much  as  possible.  In  particular,  a network 
weight  vector  is  a vector  in  real  euclidean  space  R^,  where  N is  the  total  number  of  link  weights  and 
nodal  threshold  parameters.  Let 

where  wlj^j  is  the  weight  on  the  link  that  cormects  the  i-th  input  unit  with  the  h-th  hidden  unit.  The 
superscript  T refers  to  the  transpose  of  the  vector. 

Let  the  cumulative  error  function  equation  (20)  in  section  6.0,  be  considered  a function  of  w and 
denoted  by  E(w).  The  gradient  E*(w),  is  ^ven  by 


E\w)  - 


dE 


dE  dE 


dE 


dE 


dE 


dE 


dwl 


11 


dwii^^i  dwl^^ 


dwl 


HI 


dwlnj^i  dw2^, 


dw2 


1^+1 


(23) 
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Figure  4.  Schematic  of  the  optimization  strategy. 


The  sum  and  difference  of  two  vectors,  w,  v,  in  are  given  by 


W + V = 
W - V = 


(24) 


The  inner  product  of  two  vectors,  w,  v,  in  is  given  by 

w.y.  (25) 

i=l 


The  length  or  norm  of  w is  given  by 


|w| 


N 


(26) 


The  Taylor  expansion  in  vector  form  for  E(w)  can  be  expressed  as 

E(w+y)  = E(w)  + EXw)^y  + ^y^E''{w)y  +...  (27) 

where  E”(w)  is  an  N x N matrix  called  the  Hessian. 

An  N X N matrix  A is  said  to  be  positive  definite  if 

y'^Ay  > 0 (28) 

for  all  y in  R^. 

Finally,  let  P2,...,pj^  be  non-zero  vectors  in  R^.  This  set  of  vectors  is  said  to  be  a conjugate  system 
with  respect  to  a non-singular  symmetric  N x N matrix  A if 

pjAp.  = 0 (29) 

for  all  i,j  = l,...,k,  i ^ j. 
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6.2  Computing  the  Gradient 


To  begin  the  cx)mputation  of  the  gradient,  E’(w),  note  that  if  f(u)  is  the  sigmoid  function 


then 


3^  =y(w) 


1 

l+e-“ 


(30) 


^ = /(«)  = /(«)(!  -/(«))  = y (i-y)  (31) 

du 


For  the  fully  connected  feedforward  network  described  in  5.1.5  introduce  the  notation 


7+1 


(32) 


i=l 


where  h = 1,2, ...,H  for  the  hidden  units  and  p represents  the  p-th  pattern.  The  output  of  the  h-th 
hidden  unit  is  then  given  by 


yph  =f{^h)  = 


1 


1 +e 


-sIl 


(33) 


The  derivative  of  ypj^  with  respect  to  sl^  is  then  given  by 


(34) 


from  the  property  of  the  sigmoid  function  derivative.  This  derivative  is  referred  to  as  hderiv(h)  in 
the  program  given  in  Appendix  A. 


For  each  of  the  output  units,  j = 1,2,...,J,  let 


£7+1 


s2.  = w2.,y^ 


(35) 


h=l 
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The  output  of  the  sigmoid  function  is  then  given  by 


= m = 


1 ~^i 

1 +e  ^ 


The  derivative  is  given  by 


dz 


pj  _ 


ds2: 


= -Zpy) 


and  is  referred  to  as  oderiv(j)  in  the  program  in  Appendix  A. 


Let  the  contribution  to  the  total  error  by  the  p-th  data  pattern  be  given  by 


The  total  error  is  then  given  by 


p 


p^i 


For  notation  let 


Note  that  the  gradient  component 


BE  _j. 

dw2jk  „.i  dw2jk 

depends  on  only  the  sum  s2j  for  the  j-th  output  unit  since  the  output  at  the  j-th  output 
only  on  w2jjj,  h = Therefore  by  the  chain  rule  of  partial  differentiation 


(36) 

(37) 

(38) 

(39) 

(40) 

(41) 

it  depends 
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(42) 


a£  a£  ds2, 

dw2jf^  ds2j  dw2jf^ 

But  from  the  definition  of  s2j, 

ds2. 


(43) 


Therefore 


(44) 


To  complete  this,  one  must  compute  6'’^.  Referring  to  the  definition  of  Ep 


ds2j 


(^PJ  ^Pj) 


ds2j 


since  the  output  at  the  j-th  output  unit,  Zpj,  is  the  only  one  affected  by  s2j.  But  then 


^ = -{hj- ^Pj)h>j{^  -^Pi) 


(45) 


(46) 


and  therefore 


(47) 


This  is  referred  to  as  delta2(p,j)  in  the  program  in  Appendix  A.  Finally, 


dE 

dw2j^ 


P 


-E 


(48) 


In  the  program  in  Appendix  A 


gm2UM)  = 

dw2, 


(49) 
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To  compute  the  other  components  of  the  gradient 


JE_  f 3. 


(50) 


note  that  contributes  only  to  the  sum  slj^.  Therefore,  from  the  chain  rule, 

dE^  dE^  dsl, 

dwl,^  dsli,  dwl^ 


(51) 


But 


dsh 

dwl^ 


= X 


pt 


(52) 


Then,  if  one  defines 


the  equation  above  yields 


dwi^^ 


(53) 


(54) 


and  it  is  left  to  compute  Since  slj^  only  affects  the  output  of  the  h-th  hidden  unit  one  can  use 
the  chain  rule  and  compute 


^ph 


(55) 


But  then 


dE„ 

st 


dE 

p 

^ph 


ypK(^ 


-yph) 


(56) 


From  the  definition  of  Ep 
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(57) 


^ph 


J 


-E  {tpj  - hi) 


^ph 


But 


or 


^ph  ^2,-  ^ph 


(58) 


^ph 


(59) 


Then 


El 

^pH 


J 


-E('p/ 


(60) 


Then 


El 

dsl. 


-y^(i  - yp/.)E  (‘pj  - Zp;)Zp/(i  - 


7=1 


(61) 


or 


6J<.  =ypi.{^  - ypk)T, 

7=1 


(62) 


which  is  referred  to  as  deltal(p,h)  in  the  program  and  finally 


dE 

dwi^ 


= -E 


p=l 


^yH^pi 


(63) 


Note  that  in  the  program  in  Appendix  A 
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(64) 


dE 

dwl,, 


The  order  in  which  the  gradient  components  are  compute  is  a reverse  order.  That  is,  for  each 
pattern  p,  5^^  is  computed  for  each  output  unit,  j = Then  the  gradient  component 


dE 

dw2j^ 


P 


= -E 

p^i 


(65) 


is  computed.  Next,  for  each  hidden  unit,  h = 1,...,H,  one  computes 

= yp*(i  - yp*)E 


and  sets 


dE 

dwl^ 


p 


-E  ^’’yh^pi 


(66) 


(67) 


This  is  the  reason  that  this  method  of  computing  the  gradient  is  referred  to  as  the  method  of  back 
propagation.  The  term  delta  rule  is  sometimes  used  for  obvious  reasons.  The  implementation  of 
this  computation  is  given  in  the  subroutine  FORWARD  in  Appendix  A. 

6.3  Descent  Strate^ 

The  idea  of  the  strategy  used  to  minimize  the  error  function  E(w)  is  given  in  the  following  steps: 

1.  Select  an  initial  weight  vector  and  set  n — 1. 

2.  Determine  a search  direction  and  a step  size  so  that  if  E(Wjj  + Aw^)  < E(w^)  where  Aw^ 
= a^Pn* 

3.  Compute  E’(Wjj)  and  update  the  current  weight  vector:  + Aw„- 
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4.  If  E’(Wjj)  ^ 0 then  update  the  iteration  counter  n = n + 1 and  go  back  to  step  2.  Otherwise, 
return  as  the  desired  weight  vector  that  minimizes  E(w). 

Various  implementations  of  this  strategy  differ  at  step  2 in  how  to  select  a search  direction  and  step 
size. 

6.4  Back  Propagation  Algorithm 

The  back  propagation  terminology  should  be  applied  strictly  to  the  method  of  computing  the 
gradient  of  the  error  function  as  done  in  section  6.2.  It  is,  though,  generally  applied  to  the  method 
of  updating  the  weight  vector  by  stepping  in  the  dirction  of  the  steepest  descent. 


To  identify  each  iterative  step,  n will  be  used  to  designate  the  n-th  step.  At  the  n-th  step  the  weight 
update  rule  for  the  back  propagation  algorithm  is 


Awl^in) 


dE 


dwl 


(n) 


hi 


p=l 


Pt 


(68) 


and 


Aw2jf^(n)  = -Ti 


dE 

dw2ji, 


(«)  = >1 Z 


P=1 


'ph 


(69) 


where  r{  is  the  multiplier  that  specifies  the  step  size.  It  is  sometimes  referred  to  as  the  learning 
rate.  The  index  n indicates  values  computed  at  the  n-th  stage  of  the  algorithm.  The  updated 
weights  are  often  computed  as 


Aw2j^(n) 


““H  — (w)  + aAwL.(n-l) 
dwlf^  ^ 

-r\  (n)  + aAw2.,(n-l) 

dw2.^ 


(70) 


The  last  term  in  the  sum  is  called  the  momentum  term  and  is  used  as  a means  of  increasing  the 
convergence  rate.  The  back  propagation  update  procedure  is  equivalent  to  setting 
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Pu  = -E'(y^) 


(71) 


in  6.3  and  ~ r|,  a constant.  This  does  not  include  the  momentum  term.  The  end  result  is  a 
minimization  procedure  based  on  the  first  order  Taylor  series  approximation 

£(w+y)  ~ E{w)  + E'{wYy  (7^) 


This  linear  approximation  is  one  reason  for  the  poor  convergence  record  of  the  back  propagation 
algorithm.  Hie  use  of  a constant  step  length  does  not  give  the  algorithm  the  robustness  of  an 
adaptively  adjusted  step  size.  Adding  the  momentum  term  as  an  attempt  to  make  the  algorithm 
approximately  second  order  adds  another  constant  multipier  that  again  adds  to  the  lack  of 
robustness  of  the  algorithm.  In  fact  if  the  learning  rate  parameter  is  not  selected  properly 
oscillations  in  the  error  function  can  be  generated  without  producing  any  descent. 

6.5  Scaled  Conjugate  Gradient  Algorithm 

The  algorithm  in  this  section  is  due  to  M0ller  [9]. 

Assume  that  the  error  function  E from  equation  (20)  can  be  approximated  locally  (i.e.  in  a 
neighborhood  of  a weight  vector  w)  by  a quadratic  function  of  the  form 

^?wCy)  " -^(^)  E'{wYy  + ~y'^E'\w)y  (73) 

where  the  subscript  q stands  for  quadratic.  The  minimum  of  Eq^(y)  is  among  the  critical  points 
which  are  found  by  solving 

E^(y)  = E"(w)y  + E'(w)  = 0 (74) 

for  y.  If  the  Hessian,  E”(w),  is  positive  definite  then  there  is  a unique  global  minimum.  The 
minimum  can  be  computed  iteratively.  Two  issues  arise:  How  to  select  the  search  directions  and 
how  to  maintain  the  local  Hessian  positive  definite  for  a non-quadratic  function. 

Let  pj,...,pj^  be  a conjugate  system  with  respect  to  the  Hessian.  pj,...,pj^  forms  a basis  for  R^.  The 
vector  from  the  initial  point  y^  to  the  minimum  y*  can  be  expressed  as  a unique  linear  combination 
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of 


N 


y.  -yi=T,  «-iPi 

i=l 


(75) 


for  some  aj’s  real.  The  aj’s  can  be  computed  as  follows:  Multiply  the  Eq.  (75)  by  p jE”(w)  and  get 

pl(E"{w)y,  - £V)y,)  = <tjplE"(w)p.  (76) 


Using 


gives 


£V)  = -E"{w)y^ 


p/(-E\w)  - E"(w)y,)  = ajp/E"(w)p. 


T -nil/ 


(77) 


(78) 


Noting  that 


one  can  solve  for  aj  as 


= E"(w)y,  + E'(w) 


^ ^ -PjE^jy,) 
’ pjE"(vi)p. 


(79) 


(80) 


Therefore  the  minimum  of  the  quadratic,  y*,  can  be  computed  in  N iterative  steps.  This  can  be 
restated  as  follows.  Let  p^v jPn  ^ conjugate  system  relative  to  the  Hessian  and  let  be  an 
initial  point  in  the  weight  space.  Determine  the  points  y2»— jYn+i  recursively  by 

y*.!  =yt*  <^tPk 

where 


and 


(82) 
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(83) 


§4  = PtE"(w)p^ 


Then  y^j+j  minimizes  Eq„(y). 


It  is  not  necessary  to  assume  that  a conjugate  system  be  known  ahead  of  time.  It  is 

possible  to  compute  them  iteratively  along  with  the  computation  of  y2J—jyN+i  (Hestenes[15]).  In 
fact,  be^n  with  in  the  weight  space  and  take  initially 

the  initial  steepest  descent.  Then  compute  recursively 

yk*i  = y*  * ^kPk 


where  aj^  is  given  above  in  Eq.  (82)  and 


(86) 


Set 

Pk*i  = '•**1  ^kPk 

where 

(88) 


That  is,  is  determined  recursively  as  a linear  combination  of  the  current  descent  direction, 
2,  and  the  previous  direction  pj^.. 

Since  the  error  function,  E(w),  is  not  necessarily  quadratic  the  algorithm  will  not  necessarily 
converge  in  N steps.  If  it  does  not,  then  the  algorithm  is  started  again  with  the  steepest  descent 
at  the  current  point  and  continued. 
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At  this  point  if  one  assumes  that  E”(w)  is  positive  definite  then  the  descent  strategy  from  section 
6.3  for  the  conjugate  gradient  algorithm  can  be  stated  as  follows: 

1.  Initialize.  Set  the  iteration  counter  k = 1.  Select  an  initial  weight  vector  w^.  Select  the  steepest 
descent  at  as  the  initial  direction 

p,  = r,  = -£'(w,)  (89) 


2.  Calculate  the  current  step  length 


5^  = 

^ Pk  ^k 

= Plh 


3.  Update  the  weight  vector 

^kPk 


(90) 


(91) 


4.  Get  the  new  descent  direction 


;fc+i 


= -^Wi) 


(92) 


5.  If  the  iteration  counter  k is  a multiple  of  N then  restart  the  algorithm  by  setting  the  direction 
to  the  current  steepest  descent  and  continue: 


Pk*l  ^k*l 


(93) 


6.  If  k is  not  a multiple  of  N then  compute  a new  conjugate  direction  by  setting 

7.  If  the  steepest  descent  0 then  increment  the  iteration  counter  k = k + 1 and  go  back  to 
step  2 to  get  a new  step  length. 
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(94) 


Pt*i 


8.  If  Fjj  = 0 then  return  ^ as  the  minimum. 


For  each  iteration  of  this  version  of  the  algorithm  the  Hessian  E”(wj^)  must  be  computed  and 
stored.  This  can  be  a large  matrix.  Another  approach  is  to  estimate  Sj^  in  step  2 above  by  the 
difference  quotient  (Hestenes  [15]) 


s.  = E%w,)p,  « 


(95) 


where  0 < Oj^  <<  1.  But  this  does  not  totally  solve  the  minimization  problem. 


The  previous  algorithm  assumed  that  E”(w)  was  globally  positive  definite  and  that  the  quadratic 
approximation  is  a good  approximation  of  E(w).  Two  devices  have  to  be  introduced  to  assure  this. 
The  first  device  is  to  introduce  a positive  scalar  parameter,  by  setting 


EVi  + OjP*)  - £H)  , , . 
:: Vi 


(96) 


The  parameter  (called  a Marquardt-Levenberg  parameter)  is  chosen  to  control  the  positive 
definiteness  of  E”(wjj).  The  second  device  is  introduced  to  control  how  good  the  quadratic 
approximation  is.  This  is  done  by  setting 

^ ^ gCWj)  - E(yv^*atPt) 

‘ £(Wj)  - EJ,a^p^ 

Ajj  measures  how  well  Eq^(ajjPjj)  approximates  E(wjj+ajjPjj).  If  Aj^  is  close  to  1 it  indicates  a good 
approximation. 

At  each  iteration  the  sign  of  in  step  2 above  tells  the  definiteness  of  E”(wij)-  If  > 0 it  is 
positive  definite.  If  < 0 then  must  be  adjusted  to  make  5jj  > 0.  The  adjusted  value  is 
computed  as 
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I 


(98) 


A new  5j^  is  computed  as 

K = - h)\Pkf  = -6*  - h\p^f  (59) 


The  new  step  size  is  then  given  by 


a,.  = 


6. 


\^k 


T 


h\Ptf 


(100) 


As  a note  here,  the  bigger  the  smaller  the  step  size  The  strategy  (see  Fletcher  [16])  in 
adjusting  also  involves  testing  the  measure  Aj^.  In  particular,  if  > 0.75,  can  be  relaxed  by 
setting 

K = (101) 


This  allows  longer  step  lengths.  If  Aj^  < 0.25  then  X^  is  made  larger  by 


(102) 


This  forces  shorter  steps  in  regions  where  the  quadratic  assumption  is  weaker. 

The  final  scaled  conjugate  gradient  algorithm  (M0ller  [9])  can  now  be  stated: 

1.  a.  Select  an  initial  weight  vector  and  scalars  a > 0,  X^  > 0,  and  X^  = 0. 

b.  Set  the  initial  search  direction  to  the  steepest  descent  and  the  iteration  counter,  k,  to  1,  i.e. 

= -E’(wi)  k = 1. 

c.  Set  a logical  flag  to  true  to  indicate  that  a successful  step  to  reduce  the  error  function  can  be 
made:  success  = .true. 
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2.  If  a successful  reduction  in  error  cannot  be  made,  i.e.  success  = .false.,  then  go  on  to  step  3. 
If  a successful  reduction  in  error  can  be  made,  i.e.  success  = .true.,  calculate  new  second  order 
terms: 


a 


5 


k 


T 

Pk^k 


(103) 


3.  Scale  the  approximate  Hessian,  Sj^,  and  the  definiteness  parameter,  5jj.: 

^k  ~ ^k  (^k~^k)^k\^ 


(104) 


4.  If  the  approximate  Hessian  is  positive  definite,  5^^  > 0,  then  go  to  step  5 to  calculate  a new  step 
size.  If  the  approximate  Hessian  is  not  positive  definite,  < 0,  then  make  it  positive  definite: 


^k=  ^k^ 


( 6.  ) 
* 


A,  = 2 


X.-— 2 

'■'til 


^tir 


ip 


^k  ~ _^k 
^k  ~ ^k 


(105) 


5.  Calculate  the  step  size: 


\^k  = Pkh 


(106) 


6.  Calculate  the  comparison  parameter  that  measures  how  close  the  error  function  is  to  quadratic 
at  the  current  weight  vector: 
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(107) 


A 


k 


26^[£(w^)-£(w^  + g,/7^)] 

2 

\^k 


1.  If  Aj^  > 0 then  a successful  reduction  in  error  can  be  made.  Set  success  = .true,  and 

^kPk 

'•w  = -■£'(«'»..)  (108) 

=0 


7a.  Test  whether  k is  a multiple  of  N. 


7a.l.  If  so  then  restart  the  algorithm  by  setting  the  current  direction  to  the  current  steepest 
descent,  and  go  on  to  step  7b. 


7a.2.  If  not  then  create  a new  conjugate  direction  to  step  in  by  setting: 

Pjt 

Pk^i  = '■*.1  ^kPk 


(109) 


7b.  If  Aj^  > 0.75  then  the  quadratic  approximation  is  considered  trustworthy  and  the  scale 
parameter  for  the  Hessian  can  be  reduced  by  setting 


k 


k 


(110) 


7c.  If  Ajj  < 0 then  a reduction  in  the  error  function  is  not  possible.  Set  success  = .false,  and 

r.  = X,  (111) 


If  there  are  too  many  failures  in  a row  terminate  the  algorithm,  otherwise  continue  to  step  8. 
8.  If  Ajj  < 0.25  then  the  quadratic  approximation  is  considered  untrustworthy  and  the  Hessian  scale 
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parameter  is  increased  to  reduce  the  step  size  by  setting 


(112) 


9a.  If  the  steepest  descent  is  nonzero,  rj^  ^ 0,  then  update  the  iteration  counter,  k = k+1,  and  go 
back  to  step  2. 

9b.  Otherwise,  terminate  the  algorithm  and  return  the  weight  vector  for  the  desired 

minimum. 

The  implementation  by  Dr.  James  Blue  at  NIST  of  M0ller’s  algorithm  is  given  in  the  subroutine 
OPTWTS  in  Appendix  A.  For  a comparison  of  the  numerical  advantages  of  the  scaled  conjugate 
gradient  technique  versus  the  straightforward  back  propagation  method  see  Grother  and  Blue  [8]. 


7.0  Adjusting  for  111  Conditioning 


The  discussion  in  this  section  is  based  on  the  results  of  Tychonov  and  Arsenin  [17].  Since  some 
problems  have  solutions  that  are  sensitive  to  small  computational  errors  they  have  introduced  a 
technique  that  can  be  used  to  reduce  this  sensitivity.  It  is  called  the  method  of  regularization. 
There  are  other  methods  but  in  this  section  only  the  technique  used  in  the  program  in  Appendix 
A will  be  discussed.  For  a discussion  of  other  methods  see  Saarinen,  Bramley  and  C^benko  [18]. 

Let  w be  an  n-dimensional  real  vector  and  z an  m-dimensional  real  vector  where  n is  greater  than 
or  equal  to  m.  The  object  is  to  find  w such  that 

F(w)  = z (113) 

for  a given  z.  The  solution  of  this  problem  comes  about  by  finding  a map,  R,  such  that 

w = F(z).  (114) 


Since  there  might  be  no  unique  w to  solve  (113),  R may  not  be  uniquely  defined. 

In  general,  R in  equation  (114)  will  be  called  stable  if  w depends  continuously  on  z.  More  formally, 
define  a metric  in  n-dimensional  space  by 
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(115) 


Wj  - 


IV,  = 


N 


E - ’*’2,)' 


i=l 


For  the  moment  assume  that  to  every  z there  is  a unique  w such  that  w = R(z).  The  problem  of 
finding  R is  said  to  be  stable  if,  for  every  s > 0,  there  exists  a 5(s)  such  that 

||z,  - ZjI  i 6(€)  - ||iv,  - W2II  £ e (116) 


where  = RCz^),  W2  = R(z2).  Thus,  small  perturbations  in  z lead  to  small  perturbations  in  w. 
A stable  problem  is  also  called  a well  posed  problem. 

The  problem  of  finding  an  approximate  solution  of  (113),  given  z,  in  an  ill  posed  case,  i.e.  not  stable 
as  defined  above,  may  be  ambiguous  in  that  there  may  be  multiple  solutions.  Suppose  that  in  (113) 
z is  only  known  approximately  as  z^  where 

Iz  - z^l  £ 6.  (117) 

Let 

<?6  M- 


The  problem  of  finding  an  approximate  solution  for  (113)  reduces  to  selecting  an  appropriate  w 
from  Qg.  This  set  may  be  too  large  so  that  not  every  element  can  be  selected. 

Suppose  that  F(wq)  = Zq.  An  operator  R(z,a),  depending  on  a parameter  a,  is  called  a regularizing 
operator  for  F(w)  = z if 

(1)  there  is  a 5q  > 0 such  that  the  operator  is  defined  for  every  a > 0 and 

Iz  - ZolM  S ^ 

and 

(2)  there  exists  a function  a = a(5)  such  that  for  every  s > 0 there  exists  6(s)  < 6q  such  that 
where  Wg  = R(zg,a(5)).  This  says  that,  given  a regularizing  operator  for  (113),  if 
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(120) 


1^6  - Zo|h  “ Ihs  ■ 


Ih  - Zol  ^ « (121) 

then  Wg  = R(z8,a(6))  can  be  taken  as  an  approximate  solution.  This  solution  is  called  the 
regularized  solution  and  a the  regularizing  parameter.  TTie  definition  also  implies  that  the  existence 
of  a regularizing  operator  defines  a stable  method  of  approximating  a solution  to  (113).  Therefore 
the  problem  of  finding  an  approximate  solution  of  (113)  that  is  stable  under  small  changes  in  the 
right  hand  side  reduces  to  finding  an  appropriate  regularization  operator  and  determining  the 
regularization  parameter.  This  is  called  the  Tychonov  regularization  method. 


Define  the  function 

Cliw)=\\wf 


(122) 


This  is  called  a stabilizing  function.  The  set  of  w such  that 

Iwf  <.  d (123) 

for  a fixed  d is  closed  and  bounded.  Equation  (120)  is  taken  as  the  stabilizing  function  and  will  be 
used  in  the  following  manner: 

Assume  F(w)  = Zq  has  a unique  solution  for  the  moment,  called  Wq.  Let  Zj  be  an  approximation 
to  Zq  such  that 

1^5  - Zolh  (124) 


Define 

(?5  = (w||F(w)  - z,|  s 6}.  (125) 


The  strategy  for  defining  a regularization  operator  will  be  to  look  for  elements  in  Qg  that  minimize 
n(w).  If  Wg  is  such  a number  then  the  mapping  Wg  = R(zg,6)  will  be  defined.  It  can  be  shown  (see 


40 


Tychonov  and  Arsenin  [17])  that  R(z,6)  is  a regularizing  operator  and  can  be  taken  as  an 
approximate  solution  of  (113)  for  z = Zq. 

From  this  it  is  clear  that  the  problem  of  finding  an  approximate  solution  of  (113)  with  approximate 
right  hand  side  Zg  consists  of  finding  Wg  where 

min 


and 

Q5  = {>^I||W  - Zfilh  (127) 


If  w = 0 is  not  an  element  of  Qg  then  it  can  be  shown  (Tychonov  and  Arsenin  [17])  that  the 
greatest  lower  bound  of  Q(w)  on  Qg  is  attained  at  an  element  Wg  such  that 

||^(n)  - z«|h  6-  (128) 


This  says  that  the  problem  of  solving  (113)  with  approximate  right  hand  side  Zg  is  equivalent  to  the 
conditional  extremum  problem  of  finding 

min||w|p  (12^) 

subject  to 

1F(h.)  - z,||  = 6.  (130) 


The  method  of  Lagrange  multipliers  is  used  to  solve  this  problem  by  minimizing 

L(w,a)  = ||F(w)  - Zgf  + alwf  (1^1) 

where  a is  determined  by  the  condition  that 
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II fK)  - Z.I  = 6 


(132) 


and  is  an  element  for  which  L(w,a)  attains  its  greatest  lower  bound.  Computationally  a can  be 
found  by  taking  a sequence  of  a’s 


OL 


k 


(133) 


where  ag  is  fixed  and  0 < q < 1 for  k = 0,  1,  2, ....  For  each  minimize  (129)  and  calculate  the 
differences 


P(V-^»||- 


(134) 


Select  such  that 


= 6. 


(135) 


In  practice  though  an  a is  selected  experimentally  (see  Grother  and  Blue  [8]). 

The  method  of  Lagrange  multipiers  is  used  in  the  program  in  Appendix  A.  In  the  subroutine 
FUNC  the  Lagrange  multiplier  is  called  wfactor  and  is  read  in  from  the  run  input  file. 


8.0  Statistical  Goodness-of-Fit  Measures 


Since  the  neural  net  model  assumed  in  the  feedforward  network  is  nonlinear  the  usual  statistical 
goodness-of-fit  tests  applied  in  linear  regression  analysis  are  not  directly  applicable.  However,  the 
residuals  between  the  target  training  data  and  the  predicted  data  can  be  compared  against  a normal 
distribution.  This  comparison  can  be  used  to  test  whether  the  distribution  of  residuals  forms  a 
normal  distribution  with  mean  0. 

Before  a test  against  a normal  distribution  is  made  the  statistics  to  look  at  are:  1.  Mean,  2.  Standard 
deviation  and  3.  Minimum  and  Maximum  values.  The  mean  and  standard  deviation  can  be  used 
to  transform  the  residual  errors  given  by 
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e(i)  = Target(i)  - Pred(i) 


(136) 


for  i = 1 to  the  number  of  samples  to  a standardized  form.  If  the  residuals  e(i)  satisfy  a normal 
distribution  with  sample  mean  e and  sample  standard  deviation  s,  for  sufficiently  large  samples, 

E(i)  = (e(i)  - e)/s  (137) 

approximates  a standard  normal  distribution,  i.e.  a normal  distribution  with  mean  0 and  standard 
deviation  1.  To  test  the  hypothesis  of  normality  the  distribution  E(i)  can  be  compared  to  a 
standard  normal  distribution.  The  test  can  be  performed  by  using  the  Kolmogorov-Smimov  test. 

The  Kolmogorov-Smimov  two-sample  test  is  a statistical  test  of  whether  two  independent  samples 
have  been  drawn  from  the  same  population  (or  from  populations  with  the  same  distribution).  The 
two-tailed  test  is  sensitive  to  any  kind  of  difference  in  the  distribution  from  which  the  two  samples 
were  drawn.  The  test  is  concerned  with  agreement  between  two  cumulative  distributions.  If  the 
two  samples  have  in  fact  been  drawn  from  the  same  population  distribution  then  the  cumulative 
distributions  of  both  samples  may  be  expected  to  be  fairly  close  to  each  other.  A large  enough 
deviation  between  two  sample  cumulative  distributions  is  evidence  for  rejecting  the  hypothesis  that 
they  are  drawn  from  the  same  distribution. 

The  test  involves  specifying  the  cumulative  frequency  distribution  which  would  occur  under  the 
theoretical  distribution  and  comparing  that  with  the  observed  cumulative  distribution.  The 
theoretical  distribution  represents  what  would  be  expected  under  the  null  hypothesis  which  is  that 
the  normalized  residuals  satisfy  the  normal  distribution  with  mean  0 and  standard  deviation  1.  The 
point  at  which  these  two  distributions,  theoretical  and  observed,  show  the  greatest  divergence  is 
determined.  This  statistic  is  used  to  perform  the  test. 

For  comparing  one  data  set  E(x)  to  a known  cumulative  distribution  the  statistic  of  interest  is 

® I 'E(^)  - -pw  I 


The  significance  level  of  an  observed  value  of  D is  given  approximately  by  the  equation 
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00 


(139) 


>=i 

The  significance  of  an  observed  value  of  D is  given  approximately  by 

Prob{D ^Observed  D)  = Qj^{\fN Observed  D)  (1^0) 


To  compute  the  standardized  cumulative  normal  one  computes 


X 


F(x)  = — fe  (141) 

v/27t 

for  X > 0 and  F(x)  — 1 - F(-x)  for  x < 0.  Instead  of  directly  computing  the  standardized  cumulative 
normal  one  Eq.  (139),  we  can  associate  F(x)  with  the  error  function 


X 


(142) 


by  the  relation  F(x)  = 0.5(1 +erf(xA/2))  for  x > 0 and  F(x)  = 1 - F(-x)  for  x < 0.  The  error 
function  can  be  approximated  by  a polynomial  as  given  in  the  ERF  subroutine  of  the  accompanying 
program. 


9.0  Geometric-Thermal  Error  Component  Mapping 


As  shown  earlier  in  Fig.  1,  there  are  multiple  sets  of  functions  used  as  inputs  to  a kinematic  model 
to  predict  work  volume  error.  This  study  takes  a limited  subset  of  those  data  sets  to  demonstrate 
the  nonlinear  regression  capabilities  of  a feedforward  neural  network  to  fit  these  data  sets.  Two 
data  sets  are  used.  The  measurement  given  in  these  two  data  sets  were  taken  at  different  times. 
They  are  the  x-axis  displacement  errors  moving  vertically  downward  and  vertically  upward  (see 
Figure  5).  The  data  shows  that  the  hysteresis  errors  in  approaching  a point  on  an  axis  must  be 
mapped  as  two  separate  functions. 

The  criteria  for  a good  fit  are  to  have  a distribution  of  the  residuals  after  an  optimization  or  test 
run  that  has  a near  zero  mean  and  has  a standard  deviation  of  less  than  half  a micrometer.  The 
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standard  deviation  requirement  was  based  on  the  resolution  of  the  correction  available  for  this 
machine  tool. 


9.1  X-Displacement  (Down) 

The  data  for  this  section  was  taken  while  the  cross-slide  of  the  lathe  moved  downwards. 

The  original  data  set  included  595  records  that  listed  nominal  position,  displacement  error  at  that 
nominal  position  and  40  thermocouple  readings.  For  the  locations  of  the  thermocouples  see  Figure 
5 and  Table  1.  Every  third  point  was  selected  for  a testing  set  of  data  (198  points).  The  rest  (397 
points)  were  taken  for  training. 

The  total  number  of  the  records  included  35  repetitions  of  17  samples  from  -12.7  mm  to  -215.9  mm 
along  the  x-axis,  with  a sampling  increment  of  -12.7  mm.  The  negative  sign  is  used  because  the 
machine  "home",  or  zero  position,  is  approximately  386  mm  above  the  spindle  axis.  A partial 
sample  of  the  data  set  is  shown  in  Table  2.  The  first  row  shows  the  number  of  data  samples  in  the 
data  set,  the  number  of  inputs  (41,  i.e.  one  nominal  position  and  40  thermocouples),  the  number 
of  outputs  (in  this  study  1),  and  a scale  factor  for  unit  conversion  if  necessary.  The  next  row  is  a 
set  of  column  titles.  "Position"  refers  to  the  column  beginning  with  -12.7  (nominal  position  in 
millimeters),  "Diff  refers  to  the  column  begirming  with  -1.1631E-3  (the  measured  error  in 
millimeters).  0,1,2  are  headers  for  the  next  three  columns.  Note  that  the  last  one  was  wrapped 
around  by  the  printer.  These  columns  are  the  thermocouple  readings  in  Celsius  for  thermocouples 
0, 1,  2.  The  columns  for  the  other  thermocouple  readings  are  stacked  in  groups  of  6 below  the  first 
397  records.  That  is  397  records  for  thermocouples  3 through  8 are  grouped  after  the  first  group, 
then  397  records  for  thermocouples  9 through  14,  etc. 

The  first  set  of  397  records  has  the  first  column  as  an  index  of  the  order  in  the  group  of  17  nominal 
position  samples.  Thus  1 in  column  1 indicates  the  first  of  17  positions,  2 the  second,  etc.  The 
group  numbers  repeat  for  every  repetition  of  the  17  samples.  Note  that  some  of  the  indices  are 
missing.  These  records  have  been  extracted  to  form  a file  of  test  value  to  check  the  nonlinear 
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Ballscrew  housing  38 
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Figure  5.  Turning  Center  with  Thermocouple  locations  identified 


LOCATION 

NO. 

LOCATION 

0 

Bottom  of  X-(Glass)  Scale 

20 

Left  End  of  Lower  Z-Way 

1 

Top  of  X-Scale 

21 

Left  End  of  Upper  Z-Way 

2 

Coolant  Tank 

22 

Right  End  of  Lower  Z-Way 

3 

Ambient 

23 

Right  End  of  Upper  Z-Way 

4 

Not  Used 

24 

Lower  FRont  of  Spindle  Head 

5 

Not  Used 

25 

Lower  Rear  of  Spindle  Head 

6 

Top  Right  of  Bed 

26 

Upper  Front  of  Spindle  Head 

7 

Right  of  Z-Scale 

27 

Upper  Rear  of  Spindle  Head 

8 

Right  Center  of  Z-Scale 

28 

Left  of  Top  of  Spindle  Head 

9 

Left  Center  of  Z-Scale 

29 

Middel  of  Top  of  Spindle  Head 

10 

Left  of  Z-Scale 

30 

Right  of  Top  of  Spindle  Head 

11 

Top  of  X- Way 

31 

Bottom  Left  of  Bed 

12 

Bottom  of  X-Way 

32 

Top  Left  of  Bed 

13 

Bottom  of  X-Head 

33 

Bottom  Right  of  Bed 

14 

Top  of  X-Head 

34 

Not  Used 

15 

Bottom  of  Z-Slide 

35 

Near  X-Drive  Motor  Shaft  Bearing 

16 

Top  Left  of  Z-Slide 

36 

Left  Z-Ballscrew  Bearing 

17 

Bottom  Right  of  Z-Slide 

37 

Right  Z-Ballscrew  Bearing 

18 

Top  Right  of  Z-Slide 

38 

X-Ballscrew  Housing 

19 

Hydraulic  Tank 

39 

Z-Ballscrew  Nut 

Table  1.  Thermocouple  locations  on  the  turning  center. 
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397  41  1 1.000000 

POSITION  DIFF.  012 


1 

22.61200 

-12.70000 

-1.1631000E-03 

22.58143 

22.58286 

2 

22.61400 

-25.40000 

-3.2432000E-03 

22.58286 

22.58571 

4 

22.61800 

-50.80000 

-7.4136001E-03 

22.58571 

22.59143 

5 

22.62000 

-63.50000 

-9.2676003E-03 

22.58714 

22.59429 

7 

22.62400 

-88.90000 

-1.2797900E-02 

22.59000 

22.60000 

8 

22.62600 

-101.6000 

-1.4575700E-02 

22.59143 

22.60286 

10 

22.63000 

-127.0000 

-1.7064501E-02 

22.59429 

22.60857 

11 

22.63200 

-139.7000 

-1.8334400E-02 

22.59571 

22.61143 

13 

22.63600 

-165.1000 

-2.0315200E-02 

22.59857 

22.61714 

14 

22.63800 

-177.8000 

-2.1559600E-02 

22.60000 

22.62000 

16 

22.64200 

-203.2000 

-2.3108700E-02 

22.60286 

22.62571 

17 

22.64400 

-215.9000 

-2.3184700E-02 

22.60429 

22.62857 

2 

22.69029 

-25.40000 

-4.2566000E-03 

22.63571 

22.71314 

3 

22.69543 

-38.10000 

-6.3721999E-03 

22.63857 

22.72971 

5 

22.70572 

-63.50000 

-1.0079800E-02 

22.64429 

22.76286 

6 

22.71086 

-76.20000 

-1.1552600E-02 

22.64714 

22.77943 

8 

22.72114 

-101.6000 

-1.5183800E-02 

22.65286 

22.81257 

9 

22.72629 

-114.3000 

-1.6554900E-02 

22.65571 

22.82914 

11 

22.73657 

-139.7000 

-1.8839600E-02 

22.66143 

22.86229 

12 

22.74171 

-152.4000 

-1.9727901E-02 

22.66429 

22.87886 

14 

22.75200 

-177.8000 

-2.1961600E-02 

22.67000 

22.91200 

15 

22.75714 

-190.5000 

-2.2824399E-02 

22.67286 

22.92857 

17 

22.76743 

-215.9000 

-2.3432100E-02 

22.67857 

22.96171 

1 

22.86857 

-12.70000 

-1.1750000E-03 

22.73372 

23.26657 

3 

22.88571 

-38.10000 

-5.3284001E-03 

22.74114 

23.27971 

4 

22.89429 

-50.80000 

-7.1814000E-03 

22.74486 

23.28629 

6 

22.91143 

-76.20000 

-1.0404900E-02 

22.75229 

23.29943 

7 

22.92000 

-88.90000 

-1.2283200E-02 

22.75600 

23.30600 

9 

22.93714 

-114.3000 

-1.5252300E-02 

22.76343 

23.31914 

Table  2.  A partial  sample  input  data. 
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regression. 


The  original  data  set  included  results  from  four  thermocouples  that  failed.  They  were  thermocoupl- 
es numbered  3,  4,  16,  and  19.  These  readings  were  set  to  zero  and  left  in  the  neural  net  training 
process  to  determine  whether  the  neural  net  algorithm  would  weight  the  fitting  towards  the  nonzero 
columns.  The  results  of  the  training  are  given  in  the  next  section. 

After  1000  iterations  of  the  conjugate  gradient  optimization  routine  the  residuals  between  the  target 
values  and  the  predicted  values  exhibited  a normal  distribution.  The  following  statistics  summarize 
the  distribution  of  residuals: 

Mean  = 0.4791046E-5  mm 
Std.  Dev.  = 0.3068930E-3  mm 
Range  = 0.2006001E-2  mm 
Minimum  = -0.1096E-2  mm 
Maximum  = 0.9100009E-3  mm 

A histogram  of  the  residuals  is  shown  in  Figure  6 . The  figure  shows  the  near  normality  of  the 
distribution  with  a slight  skew  to  the  positive  side.  The  normality  of  these  residuals  is  a sign  that 
the  errors  after  the  fit  exhibit  randomness  and  do  not  reflect  errors  in  the  model. 

Two  other  tests  of  the  normality  of  the  residuals  are  shown  in  Figures  7 and  8.  The  first  is  the 
normality  plot.  The  normality  plot  for  a given  distribution  is  a graphical  data  analysis  technique 
for  determining  if  the  given  distribution  provides  a good  distributional  fit  to  the  data.  The  vertical 
axis  is  the  ordered  raw  data.  The  horizontal  axis  is  the  ordered  statistic  from  the  normal 
distribution  with  mean  0 and  standard  deviation  1.  The  axis  shows  from  -3  standard  deviations  to 
+3  standard  deviations.  The  linearity  of  the  plot  is  of  interest.  The  more  linear  the  plot  the  better 
the  distributional  fit.  The  plot  exhibits  a linear  trend,  thus  indicating  normality. 

Figure  8 is  a correlation  coefficient  plot.  It  is  a graphical  data  analysis  technique  for  determining 
which  member  of  an  entire  family  of  distributions  provides  the  best  fit  to  the  data.  The  normal 
distribution  is  one  of  a large  family  of  distributions  called  the  Tukey  lambda  distributional  family. 
The  family  is  parameterized  by  a simple  value  extending  from  -2  to  2.  The  value  of  the  family 
representing  the  normal  distribution  is  0.14.  Thus  if  the  data  exhibits  a normal  distribution,  the 
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correlation  coefficient  with  each  of  the  members  of  the  Tukey  family  should  reach  a maximum  with 
respect  to  the  family  parameter  at  0.14.  In  fact  the  residual  data  achieves  a maximum  of  0.9911880 
at  0.14.  This  shows  how  close  the  residual  data  correlates  with  the  normal  distribution. 

These  results  were  generated  by  the  software  package  DATAPLOT  (see  Filliben  [19]  ) at  NIST. 

These  fitting  results  show  that  the  neural  net  algorithm  does  take  the  zero  columns  as  part  of  the 
input  pattern  and  still  reduces  the  regularized  RMS  error  so  that  the  standard  deviation  of  the 
residual  errors  falls  within  the  desired  value  of  less  than  half  a micrometer.  In  particular,  the 
standard  deviation  is  approximately  0.3  micrometers  with  a mean  of  approximately  0.005 
micrometers. 

After  fitting  the  neural  net,  the  optimum  weights  were  then  used  to  compute  the  predicted  output 
for  198  test  samples  of  the  X-Displacement  (Down)  data.  These  samples  were  all  selected 
separately  from  the  pattern  samples  used  to  fit  the  weights.  The  results  were  as  follows: 

Mean  = -0.2961085  E - 4 mm 
Std.  Dev.  = 0.2255760  E - 3 mm 
Range  = 0.2019778  E - 2 mm 
Minimum  = -0.1119867  E - 2 mm 
Maximum  = 0.8999112  E - 3 mm 

These  are  comparable  to  the  results  obtained  during  the  fitting  process.  They  again  fall  within  the 
desired  criteria  even  through  the  columns  with  zeroed  data  have  been  left  as  part  of  the  patterns. 
The  Kolmogorov-Smimov  statistic  for  the  residual  distribution  does  not  confirm  that  it  is  normally 
distributed.  However,  Figure  9 shows  that  the  major  position  of  the  distribution  is  centered  near 
zero  as  the  mean  suggests  and  falls  between  -5E-3  and  +5E-3.  This  statistical  test  is  thus  too  fine 
a test  for  the  distribution  generated.  Although  its  results  are  reported,  they  will  not  be  referred  to 
in  the  later  neural  net  fitting  trials. 
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Figure  6.  Histogram  plot  of  the  residual  error  distribution  after  the  fit  of  the  x-displacement  (down)  data  i 
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Figure  7.  Normality  plot  for  residuals  of  the  x-displacement  (down)  data. 
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Figure  8.  Correlation  Coefficient  Plot  for  the  residuals  of  the  x-displacement  (down)  data 


Another  training  run  of  1500  iterations  of  the  neural  net  was  done  but  without  the  zeroed  data 
from  the  four  failed  theromcouples.  There  were  then  37  input  nodes.  The  results  of  this  fit  are 

Mean  - 0.2527817  E - 5 
Standard  Deviation  - 0.1701827  E - 3 
Minimum  = -0.6710416  E - 3 
Maximum  = 0.7603451  E - 3 

These  values  show  that  the  neural  net  algorithm  was  able  to  reduce  the  standard  deviation  of  the 
error  well  below  the  accpetable  criteria.  Figure  10  shows  the  error  reduction  curve  for  the 
regularized  RMS  error  as  a function  of  the  iteation  cycles.  Figure  11  shows  the  histogram  of  the 
residual  errors. 

A test  was  performed  with  198  patterns.  The  residual  statistics  for  this  test  were: 

Mean  ==  -0.2718773  E - 4 mm 
Standard  Deviation  0.1935153  E - 3 mm 
Minimum  = -0.8775685  E - 3 mm 
Maximum  = 0.1151932  E - 2 mm 
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Residual  errors  are  in  millimeters. 
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Conjugate  Gradient  Iteration  Cycles 

Figure  10.  Error  reduction  curve  for  training  on  data  with  failed  thermocouples  removed.  Regularized  RMS  errors  are  in  millimeters. 
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Figure  11.  Histogram  of  residuals  for  training  data  with  failed  thermocouples  removed.  Residual  errors  are  in  millimeters. 
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Figure  11  A.  Histogram  of  residuals  for  testing  the  trained  neural  net  with  failed  thermocouples  removed.  Residual  errors  are  i 
millimeters.  ? 


9.2  X-Displa cement  (Up) 


In  this  data  set  all  columns  of  thermocouple  data  that  were  not  usable  were  eliminated  from  the 
fitting  process.  Only  35  thermocouples  were  operating  since  thermocouples  0,  4, 5, 17,  and  20  failed 
to  function.  Therefore,  36  inputs  to  the  neural  net  were  used  with  60  hidden  nodes  and  one  output 
node.  This  generated  2581  unknown  weights. 

After  1500  iterations  the  conjugate  gradient  algorithm  reported  the  following  statistics  for  the 
residual  error  between  the  measured  and  predicted  outputs: 

Mean  = 0.1994597  E - 5 mm 
Standards  Deviation  = 0.1338334  E - 3 mm 
Minimum  = -0.1036018  E - 2 mm 
Maximum  = 0.7032966  E - 3 mm 

The  regularized  RMS  error  as  a function  of  the  iterations  cycle  is  shown  in  Figure  12  and  the 
histogram  of  the  residuals  after  the  fitting  is  shown  in  Figure  13. 

A test  was  run  on  independent  data  using  the  fitted  weights  and  the  residual  errors  reported  were: 

Mean  = 0.2145889  E - 4 mm 
Standard  Deviation  = 0.1509593  E - 3 mm 
Minimum  = -0.8786396  E - 3 mm 
Maximum  = 0.7763319  E - 3 mm 

The  histogram  of  the  residuals  for  the  test  run  are  shwon  in  Figure  13a. 
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Conjugate  Gradient  Iteration  Cycles 

Figure  12.  Regularized  RMS  error  training  eurve  for  X-displaeement  upwards  with  failed  thermocouples  removed.  Regularized  RMS 
errors  are  in  millimeters. 
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Figure  13.  Histogram  of  residuals  after  neural  net  training  on  X-displacement  upward  data  with  failed  thermocouples  removed.  Residual 
errors  are  in  millimeters.  | 
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Residual  errors  are  in  millimeters. 


9.3  X-Displacement  (Down)  with  X-Axis  Related  Thermocouples 

In  this  numerical  experiment  only  X-Axis  related  thermocouples  as  shown  in  Table  1 are  used  for 
inputs.  The  thermocouples  used  in  the  fitting  of  the  data  were  numbered  0,  1,  11,  12,  13,  14,  35, 
and  35  in  Table  1.  The  fitting  algorithm  reported  the  following  results: 

Mean  = 0.1879598  E - 5 mm 
Standard  Deviation  = 0.3769576  E - 3 mm 
Minimum  = -0.2565682  E - 2 mm 
Maximum  = 0.1568761  E - 2 mm 

The  regularized  RMS  error  curve  is  shown  in  Figure  14  and  the  histogram  of  residual  errors  is 
given  in  Figure  15.  Note  that  the  residual  errors  in  this  case  seem  to  distribute  themselves 
bimodally  around  zero.  The  base  of  the  distribution  is  broader  than  those  resulting  from  fitting 
experiments  using  all  the  usable  thermocouples. 

A test  was  then  performed  on  198  data  patterns  not  used  in  the  fitting  process.  The  statistics 
associated  with  the  residuals  for  this  test  data  were: 

Mean  = -0.2894112  E - 4 mm 
Standard  Deviation  = 0.3918001  E - 3 mm 
Minimum  = -0.2590802  E - 2 mm 
Maximum  = 0.1427975  E - 2 mm 

The  histogram  of  the  test  residuals  is  given  in  Figure  15A. 

9.4  X-Displacement  (Down)  with  Z-Axis  Related  Thermocouples 

In  this  section  only  the  Z-Axis  related  thermocouples  are  used  as  inputs.  From  Table  1 the 
thermocouples  used  were  7,  8,9,  10, 15, 17, 18,  20,  21,  22,  23,  36,  37,  and  39.  Although  16  is  z-axis 
related,  it  failed  and  was  not  used.  The  fitting  algorithm  reported  the  following  results: 

Mean  = 0.9919781  E - 6 mm 
Standard  Deviation  = 0.1879221  E - 3 mm 
Minimum  = -0.1144482  E - 2 mm 
Maximum  = 0.6796103  E - 3 mm 
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Figure  15.  Histogram  of  residuals  after  training  with  X-displacement  downwards  data  for  X-axis  related  thermocouples.  Residual  errors 
are  in  millimeters. 
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thermocouples.  Residual  errors  are  in  millimeters. 


The  regularized  RMS  error  curve  is  shown  in  Figure  16  and  the  histogram  of  the  residual  errors 
is  given  in  Figure  17. 

A test  was  performed  on  198  separate  data  patterns.  The  test  statistics  of  the  residuals  were: 

Mean  = -0.3042790  E - 4 mm 
Standard  Deviation  = 0.2007157  E - 3 mm 
Minimum  = -0.1336623  E - 2 mm 
Maximum  = 0.9711841  E - 3 mm 

A histogram  of  the  test  residuals  is  given  in  Figure  17A. 


9.5  X-Displacement  (Up)  with  X-Axis  Related  Thermocouples 

This  numerical  experiment  repeats  that  described  in  section  9.3  but  used  the  data  measured  while 
the  tool  turret  moved  upwards.  Seven  thermocouples  were  used.  They  were  1,  11,  12,  13,  14,  35, 
and  38  as  shown  in  Table  1.  The  fitting  algorithm  reported  the  following  results: 

Mean  = 0.5959608  E - 5 mm 
Standard  Deviation  = 0.2965477  E - 3 mm 
Minimum  = -0.1392284  E - 2 mm 
Maximum  = 0.124761  E - 2 mm 

The  regularized  RMS  error  curve  is  shown  in  Figure  18  and  the  histogram  of  residual  errors  is 
given  in  Figure  19. 

A test  of  the  fitted  weights  was  performed  on  297  patterns  of  data.  The  test  statistics  of  the 
residuals  were  given  as: 

Mean  = 0.2316002  E - 4 mm 
Standard  Deviation  = 0.2984127  E - 3 mm 
Minimum  = -0.1407236  E - 2 mm 
Maximum  = 0.1203012  E - 2 mm 
A histogram  of  the  residuals  is  given  in  Figure  19 A. 
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Figure  17.  Histogram  of  residuals  after  training  of  X-displacement  downwards  data  against  Z-axis  related  thermocouples.  Residual  errors 
are  in  millimeters. 
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Figure  17A.  Histogram  of  residuals  for  test  data  for  X-displacement  downwards  against  Z-axis.  Residual  errors  are  in  millimeters 
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Figure  19.  Histogram  of  residuals  for  training  data  of  X-displacement  upwards  data  against  X-axis  related  thermocouples.  Residual  errors 
are  in  millimeters. 
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Figure  19A.  Histogram  of  residuals  of  test  data  of  X-displacement  upwards  against  X-axis  related  thermocouples.  Residual  errors  are 
in  millimeters. 


9.6  X"Displacement  (Up)  with  Z-Axis  Related  Thermocouples 

This  numerical  experiment  is  similar  to  that  described  in  section  9.4  except  that  data  taken  while 
the  tool  turret  moved  upwards  was  used.  In  this  case,  thermocouples  numbered  7,  8,  9, 10,  15, 18, 
21,  22,  23,  36,  37,  and  39  were  used  (see  Table  1).  The  fitting  results  reported  were: 

Mean  = 0.1878900  E - 6 mm 
Standard  Deviation  = 0.3308610  E - 3 mm 
Minimum  = -0.2056841  E - 2 mm 
Maximum  = 0.1320060  E - 2 mm 

The  regularized  RMS  errors  are  shown  in  Figure  20  and  the  histogram  of  residuals  is  shown  in 
Figure  21. 

A test  was  performed  on  297  data  patterns.  The  test  statistics  were: 

Mean  = 0.1977327  E - 4 mm 
Standard  Deviation  = 0.3356497  E - 3 mm 
Minimum  = -0.1775119  E - 2 mm 
Maximum  = 0.1378149  E - 2 mm 
A histogram  of  the  residuals  is  given  in  Figure  21A. 

9.7  X-Displacement  (Down)  with  50  Hidden  Nodes 

This  numerical  experiment  uses  36  thermocouples  but  reduces  the  number  of  hidden  nodes  used. 
This  in  effect  reduces  the  number  of  logistic  function  basis  functions  used  in  the  assumed  function 
representation.  The  fitting  results  reported  were: 

Mean  = 0.8658474  E - 6 mm 
Standard  Deviation  = 0.2031533  E - 3 mm 
Minimum  = -0.1155412  E - 2 mm 
Maximum  = 0.1089299  E - 2 mm 

Figure  22  shows  the  regularized  RMS  error  and  Figure  23  shows  the  distribution  of  residual  errors. 
The  significance  of  the  results  here  is  that  a reduction  from  60  hidden  nodes  to  50  hidden  nodes 
for  this  data  set  did  not  change  the  residual  distribution  in  any  great  degree. 
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Figure  21.  Histogram  of  residual  errors  for  training  of  X-displacement  upwards  data  against  Z-axis  related  thermocouples, 
errors  are  in  millimeters. 
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Figure  21  A.  Histogram  of  residual  errors  for  testing  X-displacement  upwards  data  against  Z-axis  related  thermocouples.  Residual  errors 
are  in  millimeters. 
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nodes.  Residual  errors  are  in  millimeters. 


A test  was  performed  on  198  data  patterns.  The  statistics  for  the  residuals  were: 

Mean  = -0.3120217  E - 4 mm 
Standard  Deviation  = 0.2221139  E - 3 mm 
Minimum  = -0.1341797  E - 2 mm 
Maximum  = 0.1040228  E - 2 mm 

A histogram  of  the  residuals  is  given  in  Figure  23A. 

9.8  X-Displacement  (Down)  with  40  Hidden  Nodes 

This  numerical  experiment  uses  36  thermocouples  but  reduces  the  number  of  hidden  nodes  used 
to  40.  The  fitting  results  reported  were: 

Mean  — 0.8227056  E - 6 mm 
Standard  Deviation  = 0.2425290  E - 3 mm 
Minimum  = -0.8013640  E - 3 mm 
Maximum  = 0.1270486  E - 2 mm 

Figure  24  shows  the  regularized  RMS  error  and  Figure  25  shows  the  distribution  of  residual  errors. 
In  this  case,  there  appears  to  be  a general  broadening  of  the  distribution. 

A test  was  performed  on  198  data  patterns  and  the  residual  statistics  were: 

Mean  = -0.2879812  E - 4 mm 
Standard  Deviation  = 0.2538050  E - 3 mm 
Minimum  = -0.1000706  E - 2 mm 
Maximum  = 0.1212418  E - 2 mm 

A histogram  of  the  residuals  is  given  in  Figure  25A. 

10.  Conclusions 

The  first  observation  that  can  be  made  is  that  the  scaled  conjugate  gradient  algorithm  is  a very 
consistent  optimization  technique  for  determining  weights  in  a neural  network.  Table  3 assigns 
index  numbers  from  1 to  9 to  the  sample  training  runs  made.  In  Table  4 the  training  run 
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nodes.  Residual  errors  are  in  millimeters 


X-Displacement  (Down)  Train 
Using  40  Hidden  Nodes 


CO 

J0 

o 

O 

c 

o 

’■!-» 

CD 

u. 

(D 


C 

0 

TD 

0 

u. 

0 

0 

+-» 

0 

O) 

O 

o 


JOJJ3  si/\iy  p0zuB|n60U 


c 

o 


(U 

*0, 

D 


o 

> 

o 

CC 


cd 


«3 

•o 


CA 

l-l 


C3 


O 

CO 


•a 

(U 

N 

*c 

cC 

1 

<U 


O'! 

(D 

i-i 

h* 


82 


nodes.  Regularized  RMS  errors  are  iri  millimeters. 
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Residual  Errors 

Figure  25.  Histogram  of  residuals  for  training  X-displacement  downwards  data  against  all  active  thermocouples  using  40  hidden  nodes. 

Residual  errors  are  in  millimeters 


X-Displacement  (Down)  Test 
Using  40  Hidden  Nodes 


84 


Residual  errors  are  in  millimeters. 


statistics  are  summarized.  The  residual  error  statistics  are  all  grouped  very  tightly  as  shown  in 
Table  4.  Since  the  number  of  connections  in  a network  influences,  the  elasped  time  for  a solution, 
the  iteration  time  per  connection  was  computed  for  each  run.  This  only  varied,  with  rounding, 
between  0.001  and  0.002  seconds.  The  elapsed  time  per  connection  varied  between  1.3  and  2.3 
seconds.  All  of  the  neural  network  computation  were  performed  in  FORTRAN  on  a CONVEX 
C3820  with  the  UNIX  operating  system.  The  other  run  time  statistics  are  also  tightly  grouped. 
Table  4 gives  the  mean  cpu  time  per  iteration,  the  standard  deviation  of  the  cpu  time  for  iteration, 
the  total  elasped  time  for  each  run,  the  iteration  time  per  network  connection  and  the  elasped  cpu 
time  per  network  connection. 

The  next  observation  that  can  be  made  about  the  scaled  conjugate  gradient  algorithm  is  that  it  is 
easy  to  use  and  converges  rapidly  as  shown  in  the  figures  in  section  9.  It  does  not  require  multiple 
submission  of  randomized  data  patterns  as  the  steepest  descent  algorithm  sometimes  requires.  It 
adaptively  adjusts  its  steps  and  does  not  require  the  user  to  guess  at  learning  and  momentum 
parameters  as  many  implementations  of  the  back-propagation  algorithm  requires.  It  can  handle 
large  network  problems  as  consistently  as  moderate  to  small  network  problems. 

Finally,  when  neural  network  methods  are  compared  with  regression  methods  for  mapping,  both 
have  advantages  and  disadvantages.  Regression  analysis  has  the  advantage  that  it  is  possible  to 
identify  a physical  interpretation  to  the  final  fitted  equations.  It’s  disadvantage  is  the  difficulty  of 
selecting,  for  example,  the  polynomial  forms  of  temperatures  and  nominal  positions  to  enter  the 
regression.  Neural  networks  clearly  will  map  thermal  and  nominal  position  data  to  position  errors 
comparatively  rapidly  and  consistently.  However,  it  is  difficult  to  assign  a physical  interpretation 
to  the  weighted  network  other  than  to  say  that  it  maps  the  training  data. 

As  long  as  the  objective  remains  only  mapping  data,  neural  networks  demonstrate  a clear  advantage 
of  ease-of-use.  But,  if  real  time  control  is  required,  then  judicious  choice  of  regression  polynomials 
hold  an  advantage  unless  computer  hardware  can  be  built  to  evaluate  neural  networks  with 
comparable  speeds. 
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Run 

Direction 

Hidden  Node 

Thermocouple  Selection  : 

1 

Down 

60 

All 

2 

Down 

60 

Operating 

3 

Up 

60 

Operating 

4 

Down 

60 

X-Axis  Related 

5 

Down 

60 

Z-Axis  Related 

6 

Up 

60 

X-Axis  Related 

7 

Up 

60 

Z-Axis  Related 

8 

Down 

50 

Operating 

9 

Down 

40 

Operating 

Table  3.  Run  Summary 
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ON 

C/^ 

CO 

’C 

O 

■4-> 

O 

c3 

«-< 

a 

JH 

O 

Down 

397 

VO 

i> 

40 

1561 

1500 

0.83E-6 

0.24E-3 

-0.80E-3 

CN 

1 

o 

o 

9 

S 

ii> 

CO 

1.81 

0.21 

2722.2 

.0012 

1.74 

CO  ■ 

Down 

397 

36 

37 

50 

1951 

1500 

0.87E-6 

0.20E-3 

-0.12E-2 

o.lle-2 

2.25 

zro 

3370.9 

.0012 

1.73 

o- 

D 

298 

CM 

o 

VO 

o 

ON 

1500 

b: 

S 

J 

0.19E-6 

0.33E-3 

-0.21E-2 

c!> 

r—i 

d> 

0.94 

0.88E-1 

1405.9 

o 

o 

p 

1.56 

VO 

a, 

D 

298 

oo 

o 

VO 

r-H 

o 

VO 

1500 

0.60E-5 

0.30E-3 

-0.14E-2 

CO 

1 

o 

cs 

t-h 

o 

0.71 

0.53E-1 

rz.901 

.0012 

1.78 

Down 

397 

o 

VO 

1021 

1500 

0.99E-6 

0.19E-3 

-O.lle-2 

0.68E-3 

1.37 

0.27E-1 

2053.0 

.0013 

2.01 

Down 

397 

00 

ON 

o 

VO 

VO 

VO 

1500 

c« 

. 

<A 

CO  i 

9 

3 

*s 

: ..4>  . 

0.19E-5 

0.38E-3 

-0.26E-2 

0.16E-2 

n 

«4-»- 

t/i 

*-l3 

cd 

c/i 

§ 

c 

9 

lOT 

0.59E-1 

1521.6 

.0015 

2.30 

c 

3 

o. 

298 

36 

37 

o 

VO 

2281 

1500 

0.73E-6 

0.17E-3 

00 

o 

1 

0.59E-3 

1.99 

iro 

r6Z.6Z 

ON 

o 

o 

o 

1.31 

CSl 

Down 

397 

36 

37 

o 

VO 

2341 

1500 

0.25E-5 

0.17E-3 

-0.67E-3 

0.76E-3 

2.69 

0.21 

4037.6 

.0011 

1.72 

Down 

397 

40 

t-h 

o 

VO 

2581 

1500 

0.33E-5 

0.17E-3 

-0.67E-3 

0.93E-3 

3.01 

0.28 

4511.6 

.0012 

1.75 

c: 

9 

Direetion 

Input  Patterns 

Thermoeouples 

Input  nodes 

Hidden  nodes 

Output  nodes 

Connections 

Iteractions 

Mean 

Std.  Dev. 

Minimum 

Maximum 

Mean  Time/Iteration 

Std.  Dev. 

Elasped  Time 

Iteration  Time/Connection 

Elasped  Time/Connection 

Table  4.  Summary  of  training  run  results. 
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APPENDIX  A 

MAIN  PROGRAM  LISTING 
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c******** ***************************************** ********************* 

c 

c NETCG.F 

c 

c*********************************************************** *********** 
c 

c Main  Routine 

c 

c This  program  implements  a fully  connected  feedforward  neural  net. 
c The  program  allows  two  alternatives: 

c 1.  If  previous  weights  have  been  computed  for  the  network 

c links  then  the  progreun  can  be  used  to  compute  net 

c outputs  from  specified  inputs. 

c 2.  If  weights  are  not  availeible  then  the  program  Ccui  be 

c used  to  compute  weights  by  a least  squares  minimization 

c technique  using  predefined  data  patterns  auid  a scaled 

c conjugate  gradient  technique, 

c 

c The  scaled  conjugate  gradient  technique  employed  is  due  to: 
c M.  F.  Holler,  "A  scaled  conjugate  gradient  algorithm  for  fast 
c supervised  learning,”  Neural  Networks  (To  appear) 
c 

c Original  Progreoa  by: 
c Jaunes  L.  Blue 

c Computing  and  Applied  Mathematics  Laboratory 

c National  Institute  of  Standards  and  Technology 

c Gaithersburg,  MD  20899 

c 

c Modified  by: 
c David  E.  Gilsinn 

c Memufacturing  Engineering  LzdDoratory 

c National  Institute  of  Standards  and  Technology 

c Gaithersburg,  MD  20899 

c 

c Modifications  made  to  apply  neural  nets  as  a mapping  technique 
c for  machine  tool  errors  as  functions  of  nominal  axis  tool  position 
c and  thermocouple  readings.  These  mapped  errors  are  dlncorporated 
c into  a geometric-thermal  error  correction  model  to  predict 
c machine  tool  errors  in  the  work  volume  as  a function  of  nominal 
c tool  position  and  thermocouple  readings 
c 

c Run  specification  file:  netcg.in 

c This  file  must  exist  before  program  execution.  It  contains  run 
c information  such  as  data  file  names,  network  parameters  and 
c convergence  criteria 
c 
c 
c 

c Definitions: 
c maucpat 

c maxins 

c maxhid 

c meixout 

c meucwsize 

c w(maxwsize) 

c 
c 
c 

c wsav(maxwsize) 

c error 


- meucimum  allowed  input  patterns  for  teaching 

- meucimum  n\imber  of  input  nodes  allowed 

- maximum  number  of  hidden  nodes  allowed 

- meucimum  number  of  of  output  nodes  allowed 

- maximum  number  of  weights  allowed 

- array  of  weights.  The  first  numl  are  the 
weights  on  the  input  to  hidden  links  and 
the  rest  are  the  weights  on  the  hidden  to 
output  links. 

- temporary  array  for  weights. 

- current  error  function  value 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


3 

p 

r 

nruns 

nseed 

egoal 

gwgoal 


errdel 

nfreq 

npats 
ninp 
nhid 
nout 
ncalls 

numl 

nu2n2 

numw 

vinp (maxpat , maxins+1 ) 

vout (maxpat , maxout ) 
target (maxpat , maxout) 
fpspec 
fppat 
fpgetw 

fpputw 

fprun 

wfactor 


output  layer  node  index 
pattern  number  index 
run  number  index 
number  of  runs 

integer  seed  number  for  uniform  random 
number  generator 

convergence  goal  for  the  RMS  of  error 
convergence  goal  for  the  ratio  of  the 
RMS  of  the  gradient  to  the  RMS  of  the 
weights.  Measure  relative  change  of  the 
error  to  the  weights. 

required  error  reduction  factor  for  the 
RMS  of  error  every  nfreq  iterations 
frequency  for  error  checking  and  printing 
of  progress. 

number  of  training  patterns  for  a given  run 

number  of  input  nodes 

number  of  hidden  nodes 

number  of  output  nodes 

number  of  calls  to  network  evaluation 

subroutine  forward 

number  of  weights  in  links  between*  input 
and  hidden  nodes 

number  of  weights  in  links  between  hidden 
and  output  nodes 

total  number  of  weights  = numl  + num2 


- input  data.  Extra  location 
accounts  for  threshold  dummy  data. 

- confuted  output  data. 

- output  data  for  training. 

file  id  for  file  names  and  run  parameters 
file  id  for  training  pattern  data  in  fnpat 
file  id  for  existing  weights,  if  emy,  in 
fngetw 

file  id  for  storing  final  weights  after 
training  in  fnputw 

file  id  for  run  summ2a:y  and  error  messages 
in  fnrun 

regularization  pareoneter 


C 4: 4r  4: 4r  4r  4r  * 4r  4: 4: 4: 4r  4:  * 4r  * 4r  4: 4;  4: 4r  4r  ir  4;  4r  4r  * ib  4:  ir  4r  4r  4: 4r  4r  ir  4r  * 4: 4;  A tfc  4;  4r  4r  ir  4;  4r  * 4r  4;  ir  * ir  4;  4; 

C 
C 
C 
C 
C 
C 
C 
C 


Parameter  Specifications 

MAXP  - Maximum  number  of  data  patterns 
MAXI  - Maximum  number  of  input  units 
MAXH  - Meucimum  number  of  hidden  units 
MAXO  - Maximum  number  of  output  units 


C********************************************-/:************************* 

parameter  (MAXP  = 3500) 
parameter  (MAXI  =42) 
parameter  (MAXH  = 100) 
parameter  (MAXO  = 26) 

parameter  (MAXW  = MAXH4r  (maxI+1)  + MAX04r  (MAXH+1) ) 
c********************************************************************** 
c 
c 
c 


Variedjle  Declarations 
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c************** ******************************************************** 
real  w (MAXW) 
real  wsav(MAXW) 

real  vinp(MAXP,  MAXI+1)  , data(MAXP,MAXI+l) 
real  vout(MAXP,  MAXO) , err(MAXP,MAXO) , temp(MAXP) 
real  target (MAXP,  MAXO) 

real  giii(MAXW)  , wnew(MAXW)  , pvect  (MAXW)  ,rvect( MAXW) 
real  svect(MAXW)  , vhid(MAXP,MAXH+l)  ,deltal(MAXP,MAXH) 
real  delta2  (MAXP,MAXO) , hderiv (MAXH)  ,oderiv(MAXO) 
integer  ithenn(MAXI) , idiim(MAXP) , idatfl(MAXI+l) 
integer  p 
integer  r 

integer  fpspec,  fppat,  fpgetw,  fpputw,  fprun,  fpscr,  fpgrph 

integer  fpgrwl,fpgrw2,  fpgrer 

character*80  title 

character *40  fnrun 

character*40  fnpat 

character*40  fngetw 

character*40  fnputw 

character*40  fngrph 

character*40  fngrwl^  fngrw2,  fngrer 

character*!  key,  aster (80) 

c**** ********************************************************** ******** 

c 

c Begin  Executables 

c 

Cic^c'k^c^e^c^c^e^c^^^c^c^:^c-k^c^c^c^e^c•1cic^c•k-k1:ie^c^(^:^c^e1c^:^t^l^c^c1c^c1c1cie^c^c^c^c4^^c^e^e^t^e^cic^c^e^:^c^c•k^:•k-kieie-kic^kic 

C 

do  2 i *=  1,80 
aster (i)  = **» 

2 continue 

maucpat  = MAXP 
maxtns  = MAXI 
maxhid  = MAXH 
. maxout  = MAXO 
naucwsize  = MAXW 

c set  file  designation  unit  numbers 
c 

fpspec  = 9 
fppat  = 10 
fpgetw  = 11 
fpputw  = 12 
fprun  = 13 
fpscr  = 14 
fpgrph  = 15 
fpgrwl  = 16 
fpgrw2  = 17 
fpgrer  = 18 
c 

c open  file  for  run  specifications 
c 

open  (fpspec,  file  = ‘netcg.in*,  status  = *old*) 
write (*,9000) 

9000  format (*  Reading  run  specification  file*) 
c 

c»»  Read  first  line:  number  of  runs 
c 

read (fpspec,  *)  nruns 
write (fprun, 999)  nruns 
c 
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c Begin  major  loop  over  the  number  of  runs, 
c 

do  23000  r = 1,  nruns 
c 

C*****************************icieic*’kicicie-kicie***-k***ic***i:*ie’k*****-k1c-k*ic’k'k 

C 

c For  each  run  read  the  run  specification  file 
c 

c************* ************************************ ****************** 

c 

c get  the  run  title  (<=  80  char) 
c 

read ( f pspec , * ) title 

c»»  read  file  names,  one  neune  per  line 
c 

c first  file  ncune:  run  output  file  name 
c 

read ( f pspec , * ) f nrun 
c 

c second  file  name:  pattern  input  file  n€uiie 
c 

read (f pspec,  *)  fnpat 
c 

c third  file  name:  initial  weights  file  name 
c 

read ( f pspec , * ) f ngetw 
c 

c fourth  file  naone:  final  weights  output  file  name 
c 

read ( f pspec , * ) f nputw 
c 

c fifth  file  name:  graphics  output  file  (Regularized  error  during  opt) 
c 

read ( f pspec , * ) f ngrph 
c 

c sixth  file  name:  input-to-hidden  weights  graphics  output 
c 

read ( f pspec , * ) f ngrwl 
c 

c seventh  file  naune:  hidden-to-output  weights  graphics  output 
c 

read (f pspec,*)  fngrw2 
c 

c eighth  file  name:  error  graphics  file  (after  fit) 
c 

read ( f pspec , * ) f ngrer 
c 

c»»'read  network  parameters  for  the  run 
c 

c number  of  input  patterns 
c ninnber  of  input  nodes 
c niimber  of  hidden  nodes 
c number  of  output  nodes 

c regularization  coefficient  of  |w|  in  error  function  (ex.  l.e-3) 
c nseed  is  0 if  reading  weights,  else  rcindom  number  seed  (ex.  12345) 
c 

read ( f pspec , * ) npats , ninp , nhid , nout , wf actor , nseed 
c 

c test  parameters  read  against  boxmds 
c 
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if(npats  .gt.  maxpat)  then 

print  *f  *Have  npats,  • patterns;  limit  is  maxpat 
stop 

else  if(ninp  .gt.  maxins)  then 

print  *Have  ninp,  • input  nodes;  limit  is  *,  maxins 
stop 

else  if(nhid  .gt.  maxhid)  then 

print  *,  *Have  *,  nhid,  • hidden  nodes;  limit  is  *,  maxhid 
stop 

else  if(nout  .gt.  maxout)  then 

print  *f  ‘Have  *,  nout,  * output  nodes;  limit  is  maxout 
stop 
end  if 
c 

c»»  read  convergence  parcuneters  for  the  current  run 
c 

c number  of  iterations  through  the  data  for  conjugate  gradient  routine 
c if  set  nonzero  then  the  iterations  are  for  training  (ex.  200) 

c if  zero  then  this  flags  a testing  run 

c goal  for  error  (RMS)  (ex.  0.01) 
c goal  for  g (RMS)  / w (RMS)  (ex.  l.e-12) 
c frequency  for  checking  convergence  (ex.  10) 
c quit  if  error  reduction  too  small  (ex.  1.0) 
c 

read ( fpspec , *) niter, egoal , gwgoal , nf req, errdel 
c 

c read  data  column  flags.  These  determine  which  input  columns 
c enter  the  fitting  process 
c 

read (fpspec,*)  (idatfl(i),  i=l,20) 
read (fpspec,*)  (idatfl(i),  i = 21,41) 
c 

c»»  end  of  run  specification  input 
c 

close (fpspec) 
c 

Q'kic’kieitle'k'kiticle’kitleit'k'kie'k'kie'kieltlelt'k'kie'kie'kielele'k'k'k’kitie’k'kie’kit'kifk'kitieie'k'k'k'k’kifk’kieifkie'kic-k* 

C 

c end  of  run  specification 
c 

c begin  reading  data  patterns  for  training  or  testing 
c 

c********************************************************************* 

c 

c compute  number  of  weights  for  each  layer 
c 

c first  get  number  of  weights  between  input  nodes  and  hidden  nodes 
c 

numl  = nhid*(ninp+l) 
c 

c next  between  the  hidden  nodes  and  the  output  nodes 
c 

num2  = nout*(nhid+l) 
c 

c total  number  of  weights  both  layers 
c 

numw  = numl  + num2 
c 

c open  output  file  for  the  run 
c 
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open(fprun,  file  = fnrun,  status  = 'un3aiown*) 
c 

c write  output  header 
c 

9005 

9007 

9008 


9009 

9019 


c 

c check  for  training  or  testing  run 
c 

if (niter  .gt.  0)  then 

write(fprun  , 988)  * Training  on  *,  fnpat 
else 

print  *,  ' ' 

write(fprun  , 988)  * Testing  on  fnpat 
end  if 

wr i te ( f prun , 9 0 0 9 ) 
c 

c get  the  inputs  and  target  patterns  to  be  learned 
c 

open(fppat,  file  = fnpat,  status  = ‘old*) 
c 

c next  call  reads  data  file  to  get  network  parmeters 
c and  sets  up  data  arrays  vinp  and  target 
c 

write (*,9010) 

9010  format (*  Reading  pattern  data*) 

call  getpat (fppat, data, idatfl, vinp, target, npats,ninp,nout, 

1 maxpat , maxins , maxout , itherm , idum , slope , tmax) 

close (fppat) 
write(fprun,9011)  npats 

9011  formate  The  number  of  data  patterns  is  *,i6) 
wr ite ( f prun , 9 0 09 ) 

write (f prun, 9007)  (aster( j) , j=l,80) 
wr ite ( f prun , 9 0 09 ) 
c 

c write  out  network  parameters 
c 

wr it e ( f prun , 9 0 2 1 ) 

9021  format ( 2 9X, 'NETWORK  SPECIFICATION') 
wr i t e ( f prun , 9 0 0 9 ) 

write (f prun, 9007)  (aster (j ) ,j=l, 80) 
write ( f prun, 9009 ) 
write ( f prun, 9022 ) 

9022  format (12x, 'INPUT  NODES *, IIX, 'HIDDEN  NODES *, IIX, 'OUTPUT  NODES') 
wr ite ( f prun , 9 0 2 3 ) 

9023  format (/) 

write (f prun, 9024)  ninp,nhid,nout 

9024  format(15x,i2,2(22x,i2) ) 


wr i t e ( f prun , 9 0 0 5 ) 
format ( ' 

wr i te ( f prun , 9 0 07 ) 
format ( lx , 8 Oal ) 
write (f prun, 9008)  title 
format ( lx , a8  0 ) 
wr ite ( f prun , 9 007 ) 
write ( fprun, 9009 ) 
format (//) 
write ( fprun , 9 019 ) 
format ( * 

wr i t e ( f prun , 9 0 0 9 ) 
write ( fprun , 9 0 07 ) 
wr i t e ( f prun , 9 0 0 9 ) 


NEURAL  NET  RUN  REPORT*) 
(aster(j) , j=l,80) 


(aster(j) , j=l,80) 

DATA  SET*) 
(aster(j) , j=l,80) 
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wr i te ( f prun , 9 0 2 3 ) 
write ( f prun ,9025) 

9025  format (8X, 'WEIGHTS  (I  TO  H) *, 8x, 'WEIGHTS  (H  TO  O) ', 9x, 'TOTAL  WEIGHTS') 
wr ite ( f prun , 9 0 2 3 ) 

write (f prun, 9026)  numl , num2 , numw 

9026  format (14x, i4,20x, i4, 19x, i4) 
wr  i te  ( f pr\in , 9 0 0 9 ) 

write (f prun, 9007)  (aster(j) , j=l,80) 
wr i t e ( f prun , 9 0 0 9 ) 
c 

C *************************************************************'**'*  * 

c 

c end  of  data  pattern  input 
c 

c begin  entering  initial  network  weights 
c either  random 

c or  previously  saved 

c 

0***************************************************************** 

c 

c get  initial  weights 
c 

c if  nseed  is  a positive  integer  generate  random  initial  weights 
c 

write (*,9020) 

9020  formate  Generating  network  weights') 

wr i t e ( f prun , 9 0 2 7 ) 

9027  format(33x, 'RUN  PARAMETERS') 
wr ite ( f prun , 9 0 09 ) 

write (f prun, 9007)  (aster(j) , j=l,80) 
wr ite ( f prun , 9 0 0 9 ) 
if (nseed  .le.  0)  then 
c 

c otherwise  get  weights  from  the  file  fngetw 
c 

open(fpgetw,  file  = fngetw,  status  = 'old') 
write (fprun,  987)  fngetw 

call  setwts(fpgetw,  w,  w(numl+l) , 0,ninp,nhid,nout) 
close  ( f pgetw) 
c 

c generate  random  weights  for  nseed  > 0 
c 

else 

write (fprun,  986)  nseed 

call  setwts(0,  w,  w(numl+l) , nseed, ninp,nhid,nout) 
end  if 
c 

c**************************************************************** 

c 

c end  of  initial  weight  generation 
c 

c begin  weight  optimization  by  the  conjugate  gradient  method 
c 

c******* ************************************************** ******* 
c 

c write  out  the  regularization  pareuneter 
c 

write (fprun,  984)  wf actor 
c 

c save  the  current  weights  for  computing  the  rms  change  in  weights 
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c for  each  training  run 
c 

do  23014  n = 1,  numw 
wsav(n)  = w(n) 

23014  continue 

c 

c write  out  convergence  criteria  for  a training  run,  ignore 
c for  a testing  run 
c 


9028 


90281 


9029 

90282 
90280 

& 

90283 

90284 

90285 

& 
c 

c Open  scratch  and  graphics  files 
c 

open ( f pscr , status^  * scratch  * ) 
open ( f pgrph , f ile=f ngrph , status=  * unknown  * ) 
open  ( f pgrer , f ile=f  ngrer , status=  • unknown  * ) 
c 

c initialize  calls  to  subroutine  forward.  This  subroutine  evaluates 
c the  network  output  for  a given  input  pattern  given  the  current 
c weights 
c 

ncalls  = 0 
c 

c Do  the  training  or  testing  depending  on  niter 
c niter  = 0 for  testing 

c niter  > 0 for  training 

c 

c In  either  case  the  error  is  returned  with  an  appropriate  flag 
c 

iwrt  = 0 
write (*,9030) 

9030  format (*  Entering  optimization  phase.') 

call  optwts (niter,  numw,  w,  error,  gw,  iter,  ierr,numl, 

1 num2,wf  actor,  vinp,vout,  target,  npats,ninp, 

2 nhid , nout , maxpat , meocins , maxhid , meocout , 

3 maxwsize , ncalls , gm , wnew,  pvect , rvect , svect , vhid. 


if (niter  .gt.  0)  then 
write ( f prun ,9028) 
format ( * 1 * ) 

write (f prun, 9007)  (aster( j) , j=l,80) 
write ( f prun , 9009 ) 
write ( f prun , 9 02 8 1 ) 

format ( 2 8x, 'RUN  TERMINATION  CRITERIA') 
wr i te ( f prun , 9 0 0 9 ) 

write (f prun, 9007)  (aster( j) , j=l,80) 
wr ite ( f prun , 9 0 09 ) 
write (f prun, 9029)  niter 

formate  Maximum  number  of  iterations  is  ',i6) 
write (f prun, 90282)  nfreq 

format ( ' Error  checking  frequency  is  every  ' , i3 , ' iterations ' ) 
wr ite ( f prun , 9 0 2 8 0 ) 

formate  is  terminated  on  either  max  iterations  or',/, 

' one  of  the  criteria  below.') 
write (f prun, 90283)  egoal 
format ( ' RMS  error  <=  ' , gl2 . 3 ) 
write (f prun, 90284)  gwgoal 

formate  (RMS  of  g)  <=  ',gl2.3,'  * (RMS  of  w)') 

write (f prun, 90285)  errdel, nfreq 

formate  (RMS  err)  > ',gl2.3,'  * (RMS  err  ',i4, 

' iterations  ago) ' ) 

end  if 
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4 deltal , delta2 , hder iv , oderiv , f pscr , f pgrph , f prun , 

& nfreq,errdel,egoal,gwgoal,iw2rt) 

close ( fpgrph) 
close (f pscr) 
c 

Ci:-k’k’k1ci:-k'k’k-kicicieie’kicic**’k’k-ki!ieieic*’k*-kicicrkieicierk‘kicic’ki:*‘1cicie'k'kicic-ki:ic‘kic’kic*ieicicic'kic'k*'k 


c end  of  weight  optimization 
c 

c write  out  run  information 
c 

c******************************* ********************* ************** 


9040 


9041 


9043 

9045 


9046 

9048 


c for 


9100 


write (*,9040) 
format ( * Output  phase . • ) 
iwrt  = 1 

call  func(. false.  ,numw,w, error, gw, numl,num2,wf actor, 

& vinp , vout , target , npats , ninp , nhid , nout , 

& maxpat , maxins , maxhid , maxout , ncalls , vhid , 

& deltal , delta2 , hderiv,  oderiv , iwrt , f prun) 

wr i t e ( f prun , 9 0 2 8 ) 

write (f prun, 9007)  (aster(j) , j=l,80) 

wr ite ( f prun , 9 0 0 9 ) 

wr ite ( f prun , 9 0 4 1 ) 

format ( 3 5x , • RUN  RESULTS » ) 

wr ite ( f prun , 9 0 09 ) 

write (f prun, 9007)  (aster(j) , j=l,80) 
write ( f prun ,9009) 
do  9045  np  = 1, npats 
do  9043  j = l,nout 

target (np,j)  = tmax  + (target(np, j)  - 0.99) /slope 
vout(np,j)  = tmax  + (vout(np,j)  - 0.99) /slope 
err(np,j)  = (target(np, j)-vout(np,  j) ) 
continue 
continue 
ermn  = 0.0 
targmn  = 0.0 
errmax  = l.e-20 
errmin  = l.e20 
do  9048  np  = 1, npats 
do  9046  j = l,nout 

ermn  = ermn  + err(np,j) 

targmn  = targmn  + target (np,j) 

if  (err(np,j)  .gt.  errmax)  errmax  = err(np,j) 

if  (err(np,j)  .It.  errmin)  errmin  = err(np,j) 

continue 

continue 

ermn  = ermn/float(np*j) 
targmn  = targmn/float(np*j) 
one  output  node  only 
do  9100  np  = 1, npats 
temp(np)  = err(np,l) 
continue 
std  = 0.0 
ssmn  = 0.0 
ssreg  =0.0 
do  90491  np  = 1, npats 
do  9049  j = l,nout 

std  = std  + (err(np, j)-ermn) **2 

ssmn  = ssmn  + (target(np,  j ) - targmn)  **2 
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9049 

ssreg  = ssreg  + (vout(np,j)  - targmn)**2 
continue 

90491 

continue 

std  = std/float(np*j  - l) 
std  = sqrt(std) 
rsq  = ssreg/ssmn 
write (fprun, 90490)  rsq 

90490  format (//, 2 8x, *R-SQ  of  nonlinear  fit  = *,gl5«7) 
write ( fprun, 90492 ) 

90492  format (//, 2 8x, *Pointwise  Error*) 
write ( f prun , 9 02 3 ) 

write ( f prun ,90493) 

90493  format(15x, 'Mean* ,27x, 'Standard  Deviation*) 
write (f prun, 90494)  ermn,std 

90494  format (9x,gl5.7,25x,gl5.7) 

c normalize  eror  distribution  and  compute  Kolmogorov-Smimov  statistic 
do  9105  np  = l,npats 

temp(np)  = (temp(np)  - ennn)/std 
9105  continue 

c 

c compute  Kolmogorov-Smimov  statistic  and  significance 
c 


90498 


90495 

& 

90496 


90500 

90503 
c 

c generate  error  histogreim  display  file 
c 

write (fpgrer, 90504)  err(np,l) 

90504  format(gl2*4) 

90505  continue 
c 

c display  k-s  statistic 
c 

write (f prun, 9110)  d,prob 

9110  formate  Kolmogorov-Smimov  distemce  statistic  = *,gl5.7,/, 
& * Kolmogorov-Smimov  significance  value  = *,gl5.7) 

close ( fpgrer) 
c 

c write  out  convergence  information  for  training  runs,  ignore  for 
c testiing  runs 
c 

if (niter  .gt.  0)  then 

call  endopt(fprun,  iter,  ncalls,  ierr,  error,  gw) 
c 

c compute  rms  change  in  the  weights  for  training  runs,  ignore  for 
c testing  runs 
c 


call  ksone(temp,npats,d,prob) 
wr ite ( f prun , 9 0 4 9 8 ) 

format (//,14x,  'Minimxim*  ,32x,  'Maximum') 
write (f prun, 90494)  errmin,errmax 
wr ite ( f prun , 9 02 3 ) 
write ( f prun, 90495) 

format (8x,  'Data*  ,7x,  'Output*  ,7x,  'Desired*  ,7x,  'Predicted* , 

7 X, 'Absolute* ) 
write ( f prun , 9 04 9 6 ) 

format (8x,  'Index*  ,7x,  'Node*  ,8x,  'Output*  ,8x,  'Output*  ,10x,  'Error* ) 
do  90505  np  = l,npats 
do  90503  j = l,nout 

write (f prun, 90500)  np, j , target (np,j) ,vout(np, j) ,err(np, j) 
format  (8x,  i4 , 8x,  i4 , 5x,gl2 . 4 , lx,gl2 . 4 , 7x,gl2 .4) 
continue 
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23020 


do  23020  n = 1,  numw 

wsav(n)  = wsav(n)  - w(n) 
continue 

dif  = snrm2  (numw,  wsav,  1)  / sqrt(float  (numw) ) 
write  (fpirun,  985)  dif 
c 

c skip  to  here  for  testing  runs 
c 

end  if 
c 

c********************************************* ********************** 
c 

c write  out  final  weights  for  training  runs,  ignore  for  testing 
c 

c***************************************************** ************** 


c 

c print  final  weights  for  training  runs 
c 

if (niter  .gt.  0)  then 

open(fpputw,  file  = fnputw,  status  - *\in)aiown‘) 
open  (fpgrwl,  file  = fngrwl , status  = 'unknown*) 
open (fpgrw2, file  = fngrw2 , status  = 'unknown*) 
call  putwts(fpputw, fpgrwl, fpgrw2,w,  w(numl+l)  ,ninp,nhid,nout) 
close  ( f pputw) 
close  (fpgrwl) 
close ( fpgrw2 ) 
write (fprun,  983)  fnputw 
end  if 
close (fprun) 
c 

c******************************************************************** 


c 

c end  of  the  run  loop.  Go  back  for  another  run  specification 
c if  any 
c 

c********* *********************************************** ************ 


c 

c loop  on  runs 
c 

23000  continue 
stop 
c 

C*******4r4r****  **4;***  **************************************  ********** 
C 

c formats 


c 

c**** ******************************************************** ******* 


c 

983 

format ( * 

984 

format ( ' 

985 

format ( * 

986 

format ( ' 

987 

format ( ' 

988 

format ( ' 

999 

format ( ' 

end 

Weights  written  to  file  ' , a40) 
Regularization  factor  wf actor  = ',  lpel2.3) 
Rms  change  in  weights',  f6.3) 

Random  initial  weights,  seed  ' , ilO) 

Initial  weights  from  file  ' , a41) 

'/al2,  a41) 

Doing  ',  i3,  ' run(s)') 


c******* ******************************************************* ******** 


c 

c setwts 

c 


102 


c********************************************************************** 

subroutine  setwts(fp,  wl,  w2,  nseed,ninp,nhid,nout) 
c********************************************************************** 
c 

c Set  up  the  initial  weights  for  the  neural  net 

c Test  on  the  unit  number  fp. 

c If  fp  > 0,  read  from  there. 

c If  fp  <=  0,  use  pseudo-random  weights  in  the  range 

c (-scale, scale) , but  first  initializing 

c the  generator  to  nseed. 

c wl  - weights  between  input  and  hidden  nodes 

c w2  - weights  between  hidden  and  ouput  nodes 

c 

c********************************************************************** 
parameter (scale  = 0.5) 
real  wl(nhid,  ninp+1) 
real  w2(nout,  nhid+1) 
integer  fp,  h,  i,  j,  minp,  mhid,  mout 
c 

c*********** ************************************************** ********* 
c 

c test  unit  number  fp  for  whether  to  read  weights  or  generate 
c random  weights 
c 

c if  fp  > 0 then  read  weights 
c 

0 4r  4: 4r  4:  * * * 4:  * 4r  * 4c  ir  * * * * * * * * 4r  4r  * * 4:  * * * 4r  ib  4r  * A * 4: 4;  * 4: 4r  4: 4r  4;  4r  4;  * 4r  4: 4c  4: 4;  4: 4r  4r  4c  * * ir  4;  4;  4;  * 4:  * 4:  ic  4r 

C 

if(fp  .gt.  0)  then 
c 

c********************************************************************* 

c 

c come  to  this  section  to  read  previously  stored  weights 
c 

C***4c**4r**4c******4r*******4c*****4c****4r**4c4:********4c***4r**4c*4r**4c4c4c*4c**4c* 

C 

c first  get  number  of  input,  hidden  and  output  nodes  associated  with 
c the  weights 
c 

read(fp,  *)  minp,  mhid,  mout 
c 

c test  these  against  network  parameter  specs  for  this  run 
c 

if((ninp  .ne.  minp)  .or.  (nhid  .ne.  mhid)  .or.  (nout 
& .ne.  mout))  then 

if(ninp  .ne.  minp)  then 

print  *,  * Saved  network  has  *,  minp,  * inputs;  using  *, 
& ninp 

if (nhid  .ne.  mhid)  then 

print  *,  * Saved  network  has  *,  mhid, 

& ' hidden  nodes ; using  ' , nhid 

if (nout  .ne.  mout)  then 

print  *,  * Saved  network  has  *,  mout, 

& • outdput  nodes ; using  * , nout 

stop 
end  if 
end  if 
end  if 
end  if 
c 
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c if  network  parameters  match  then 
c read  input  to  hidden  nodes  weights 
c 

do  23040  h = 1,  nhid 

read(fp,  *)  (wl(h,i),  i = 1,  ninp+1) 

23040  continue 

c 

c then  read  hidden  to  output  nodes  weights 
c 

do  23042  j = 1,  nout 

read(fp,  *)  (w2(j,h),  h = 1,  nhid+1) 

23042  continue 

c 

Qic'kic'k’kic1iic4c1eie1eieicie'k1c1c1c1eie1cic1c"kic1c1c1eicici:ie1cie'k1e1c1c’kieicif±±ic'k’kic1cii1i'kieicieicieieieieicicieicie'k 

C 

c return  after  weights  have  been  read  in 
c 

c otherwise 
c 

c if  fp  = 0 come  here  to  generate  random  weights 
c in  the  range  (-scale, scale)  where  scale  is  set  at  the  beginning 
c of  this  subroutine  as  a parameter, 
c 

0*********************************** ******************************** 
c 

else 

c 

c initialize  random  numbers  to  given  seed 
c 

z = uni(nseed) 
c 

c set  input  to  hidden  nodes  weights 
c 

do  23044  h = 1,  nhid 

do  23046  i = 1,  ninp+1 

wl(h,i)  = 2.0  * scale  *(uni(0)  - 0.5) 

23046  continue 

23044  continue 

c 

c set  hidden  to  output  nodes  weights 
c 

do  23048  j = 1,  nout 

do  23050  h = 1,  nhid+1 

w2(j,h)  = 2.0  * scale  *(uni(0)  - 0.5) 

23050  continue 

23048  continue 

c 

Cle‘k1cic-k1t‘k1e1f1c*1e1ticik1e1c'k1e1:1cie1:1c1e1c*icit1(1e'k’kic1:1eic1(ic1e1e1f-k1c1e‘k1(1c1e1e1c1:’k1e'k1e1c1e1e1fi(1e1c1iieic 

C 

c return  section 
c 

+ + 4c  ********************  A******* 

C 

end  if 
return 
end 
c 

c* *********************************************** ********************** 
c 

c getpat 
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c 

c********** ************************************************************ 
subroutine  getpat ( f ppat , data , idatf 1 , vinp , target , npats , ninp , nout , 
1 maxpat , maxins , maxout , itherm , idum , slope , tmax) 

0********************************************************************** 

c 

c Read  the  input,  the  target  for  each  pattern 
c 

c This  subroutine  depends  on  the  particular  data  set  formats 
c 

c The  current  data  set  includes  eixis  position,  machine  tool  error 
c at  the  position  and  40  thermocouple  readings 
c 

0********************************************************************** 
integer  fppat 

real  vinp (maxpat , maxins+l ) , data (maxpat , maxins+1 ) 
real  target (maxpat,  maxout) 
integer  i,  j,  p,  idatfl (maxins+l) 
character*10  posit, diff 
integer  itherm (maxins ) , idum (maxpat) 
c 

c read  number  of  data  patterns  in  file,  number  of  input  variables, 
c niomber  of  output  variables  cind  a scale  factor  for  the  output  data, 
c This  scale  factor  is  used  to  produce  the  correct  units  if  necessary, 
c This  section  also  checks  against  specifications.  If  there  are 
c discrepancies  then  it  reports  and  stops, 
c 

read (fppat,  *)  mpats,  minp,  mout,  scale 

if(mpats  .It.  npats  .or.  ninp  .ne.  minp  .or.  nout 
& .ne.  mout)  then 

if (npats  .gt.  mpats)  then 

print  *,  * File  has  *,  mpats,  • patterns;  using  *,  npats 
end  if 

if (ninp  .ne.  minp)  then 

print  *,  * File  has  •,  minp,  * inputs;  using  *,  ninp 
end  if 

if (nout  .ne.  mout)  then 

print  *,  • File  has  »,  mout,  * outdput  nodes;  using  *,  nout 
end  if 
end  if 
c 

c read  header  line 
c 

read(fppat,*)  posit, diff , (itherm(j) , j=l,3) 
c 

c get  first  set  of  data,  scale  the  nominal  axis  position 
c and  map  the  target  output  to  the  interval  (0,1) 
c 

tmin  = l.eSO 
tmax  = -l.e30 
do  80  p = 1, npats 

read(fppat,*)  idum(p) , data(p,l),  target(p,l), 

1 (data(p,j) ,j=2,4) 

if (target (p,l)  .le.  tmin)  tmin  = target (p,l) 
if  (target  (p,l)  .ge.  tmax)  tmax  = target  (p,l) 

80  continue 

c 

c transform  target  data  to  lie  between  0.01  and  0.99  in  order  to 
c match  sigmoid  range 
c 
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slope  = 0.98/ (tmax-tmin) 
do  85  p = l,npats 

target (p,l)  = slope* (target (p, 1) 
continue 


tmax)  + 0.99 


85 
c 

c get  the  next  6 groups  of  thermocouple  data 
c 

do  100  k = 1,6 

read(fppat,*)  (itherm(j),  j=4+(k-l) *6,9+(k-l)  *6) 
do  90  p = l,npats 

read(fppat, *)  (data(p, j) , j=5+(k-l) *6,10+(k-l)  *6) 

90  continue 

100  continue 

c 

c get  data  for  the  40-th  thermocouple 
c 

read(fppat, *)  itherm(40) 
do  110  p = l,npats 
read(fppat, *)  data (p, 41) 

110  continue 

c 

c copying  nonzero  columns  to  vinp 
c 

minp  = 0 
do  182  j = 1,41 

if  (idatfl(j)  .ne.  0)  then 
minp  = minp+1 
do  181  p = l,npats 

vinp (p, minp)  = data(p,j) 

181  continue 
endif 

182  continue 
c 

c the  last  weight  of  each  hidden  node  is  really  its  bias, 
c so  its  input  value  is  1 
c 

do  183  p = l,npats 
vinp (p, minp+1)  = 1.0 

183  continue 
ninp  = minp 
return 

end 

c************ ************************* ********************************* 

c 

c func 

c 

c*** ******************************************************************* 
subroutine  func(dograd,  numw,  wt,  err,  gm,numl,num2,wf actor, 

1 vinp , vout , target , npats , ninp , nhid , nout , 

2 maxpat , maxins , maxhid , maxout , ncalls , vhid , 

3 del  tal , delta2 , hder  iv , Oder  iv , iwrt , f prun) 

C* ****** ***4r  ************************************************ *********** 

C 

c This  subroutine  regularizes  the  error  and  negative  gradient  returned 
c from  the  neural  network  evaluation  function  forward 
c 

C**********ir4r**********  *************************************  *********** 

logical  dograd 

real  wt(*) , err,  gm(*) 

real  vinp  ( maxpat,  maxins+1) , vout  (maxpat,  maxout) 
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real  target (maxpat , maxout ) , vhid (maxpat , maxhid+1 ) 
real  deltal (maxpat, maxhid) , del ta2 (maxpat, maxout) 
real  hderiv(maxhid) , Oder iv (maxout) 
integer  fprun 
c 

c Evaluate  the  neural  net  function 
c 

call  forward (dograd,  wt,  wt(numl+l),  err,  gm,  gm(n\aml+l)  ,vinp, 

1 vout , target , npats , ninp , nhid , nout , maxpat , maxins , 

2 maxhid,  maxout,  ncal  Is,  vhid,  deltal,  del  ta2  ,hderiv, 

3 Oder  i V , iwrt , fprun ) 
c 

c average  the  norm  squared  of  the  weight  vector, 
c 

wsq  = sdot(numw,  wt,  1,  wt,  1)  / (2  * numw) 
c 

c This  is  the  Tychonov  regularization  step,  wf actor  is  the 
c regularization  parameter 
c 

err  = err  + wfactor  * wsq 
wf  = wfactor  / numw 


c 

c 

c 

c 

c 


return  the  regularized  negative  gradient  if  needed.  Note  the 
gm  is  already  negative  on  return  from  forward.  The  sign 
just  propagates  the  negative  to  the  second  term. 

then 


if (dograd  .eq.  .true.) 
do  23105  n = 1,  numw 

gm(n)  = gui(n)  - wf  * wt(n) 

23105  continue 

end  if 
return 
end 

Qicic'kiclcicicicicicle'kii'kiclclfk’k'kic'klc'kic^ciiicicic'k-kiciticrk'k-kiciclc-k'k'kiclc'kiclcicicicicicicicicicis'k’kic'ieic’kisicicicic 

C 

c forward 

c 

icificidc  Icicle  1:  it  1e1e1ei(1(  Icicle  Icitlelcleie  1c  isle -k  Icicle  hie  1c  Icicle  Iclchh  Icicle  Icicle  Ic’klc'kiclele-kielc’k 

sxibroutine  forward  (dograd,  wl,  w2,  error,  gml,  gm2,vinp, 

1 vout , target , npats , ninp , nhid , nout , maxpat , 

2 mcLxins , maxhid , maxout , ncalls , vhid , deltal , 

3 delta2 , hderiv , Oder iv , iwrt , fprun) 

C.1c1:1e1e1c1e1c1e1e1e1e1c1c1i1e1c1e1c1e1c1e1e1c1e1e1e1e1e1c1e1e1e1e1c1c1e1e1e1e1e1e1e1e1c1c1c1e1c1e1c1e1e1c1e1c1e1e1e1e1c1e1c1e1c1c1e1e1e1e1c 
C 
C 
C 
C 
C 
C 
C 

Clc1c1c1:1c1e1c1e1c1c1e1e1e1e1e1c1e1e1e1c1e1e1c1e1e1c1e1e1e1e1e1e1e1e1e1c1e1e1e1c1c1e1c1e1e1e1c1e1e1e1c1e1e1c1e1e1e1e1e1e1e1c1c1e1e1e1c1c1e1c 

logical  dograd 
real  wl(nhid,  ninp  + 1) 
real  w2(nout,  nhid  + 1) 
real  gml(nhid,  ninp  + 1) 
real  gm2(nout,  nhid  + 1) 
real  vinp (maxpat,  maxins+1) 
real  vout (maxpat,  maxout) 
real  target (maxpat,  maxout) 
real  vhid (maxpat,  maxhid+1) 


Calculate  outputs  and  (optionally)  negative  gradient.  This  is  the 
main  neural  net  evaluation  function.  It  presumes  a fully  connected 
feedforward  neural  net  with  two  active  layers,  a hidden  one  and 
an  output  one.  The  unit  activation  fxinctions  are  the  sigmoid 
functions . 
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real  delta2  (maxpat,  laaxout) 
real  del tal (maxpat,  maxhid) 
real  hderiv (maxhid) 
real  oderiv(maxout) 
integer  h,  i,  j,  p,  fprun 
c 

c SMIN  should  be  small  enough  so  that  exp(SMIN)  is  negligible, 
c but  large  enough  so  that  exp(-SMIN)  does  not  overflow, 
c 

parameter  (smin  = -40.0) 
c 

c accumulate  calls  to  the  neural  net  evaluator 
c 

ncalls  = ncalls  +1 
c 

c initialize  error  fo  accumulation 
error  = 0.0 
c 

c The  object  of  this  loop  over  all  patterns  is  to  compute  the  sum 
c of  the  output  errors  over  all  of  the  patterns  and  to  optionally 
c compute  the  network  delta  factors  for  each  network  layer  as  a 
c function  of  each  pattern, 
c 

do  23107  p = 1,  npats 
do  23109  h = 1,  nhid 
c 

c Accumulate  the  input  signals  to  each  hidden  layer  node  as  the 
c inner  product  of  the  vector  of  weights  on  all  of  the  links 
c from  the  input  nodes  and  the  vector  of  all  of  the  inputs  for  the 
c p-th  pattern,  sdot  is  the  dot  product  function  from  the  BIAS 
c package  and  essentially  performs 
c sum  = 0 

c do  i = 1,  ninp+1 

c sum  = sum  + wl(h,i)  * vinp(p,i) 

c 

sum  = sdot(ninp+l,  wl(h,l),  nhid,  vinp(p,l),  mcixpat) 
c 

c assume  the  exponential  sigmoid  function  for  the  activation  function 
c and  evaluate  the  output  at  each  hidden  node  as  a function  of  the 
c input  signal.  Test  for  underflow  first, 
c 

if (sum  .ge.  smin)  then 

vhid(p,h)  = 1.0  / (1.0  + exp(-sum)) 

else 

vhid(p,h)  = 0.0 
end  if 
c 

c compute  the  derivative  of  the  hidden  node  output  with  respect  to 
c the  total  input  signal,  i.e.  partial  of  vhid  w.  r.  t.  sum. 
c 

hderiv (h)  = vhid(p,h)*(1.0-vhid(p,h) ) 

23109  continue 

c 

c the  last  weight  of  each  output  node  is  really  its  bias, 
c so  its  output  value  is  1 
c 

vhid(p,nhid+l)  = 1.0 
c 

do  23113  j = 1,  nout 
c 
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c Compute  the  input  signal  to  each  output  node  as  the  inner  product  of 
c the  weight  vector  between  the  hidden  nodes  and  the  output  nodes 
c and  the  output  vector  of  values  from  the  hidden  nodes,  sdot  performs 
c 

c sum  = 0 

c do  h = 1,  nhid+1 

c sum  = sxim  + w2(j,h)  * vhid(p,h) 

c 

sum  = sdot(nhid+l,  w2(j,l),  nout,  vhid(p,l),  maxpat) 
c 

c assxime  the  exponential  sigmoid  function  as  the  activation  function 
c at  each  output  node.  Test  for  underflow, 
c 

if (sum  .ge.  smin)  then 

vout(p,j)  = 1.0  / (1.0  + exp (-sum)) 

else 

vout(p,j)  = 0.0 
end  if 
c 

c compute  the  derivative  of  the  output  with  respect  to  the  total  input 
c signal,  i.e.  partial  of  vout  w.  r.  t.  sum. 
c 

oderiv(j)  = vout(p,  j)  *(1.0-vout(p,  j) ) 
c 

c accumulate  error  function  over  the  patterns.  The  division  by  2 in 
c the  error  function  definition  is  done  once  below, 
c 

error  = error  + (target(p,j)  - vout(p,j))  **  2 
23113  continue 

c 

c if  the  negative  gradient  is  optionally  required  generate  the 
c the  network  deltas, 
c 

if(dograd  .eg.  .false.)  goto  23117 
c 

c output  deltas 
c 

do  23119  j = 1,  nout 

delta2(p,j)  = (target (p,j)  - vout(p,j))  * oderiv(j) 
23119  continue 

c 

c hidden  deltas 
c 

do  23121  h = 1,  nhid 
c 

c perform  an  inner  product  here  with  sdot 
c 

c sum  = 0.0 

c do  j = 1,  nout 

c sum  = sum  + delta2(p,j)  * w2(j,h) 

c 

sum  = sdot(nout,  delta2(p,l),  maxpat,  w2(l,h),  1) 
deltal(p,h)  = sum  * hderiv(h) 

23121  continue 

c 

c this  is  the  end  of  the  loop  over  all  patterns 
c 

23117  continue 
23107  continue 
c 
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c Average  error  per  node  over  all  patterns  and  nodes 
c 

div  = (npats  * nout) 
error  = error  / (2  * div) 
if(dograd  .eq.  .false.)  goto  23123 
c 

c optionally  generate  the  negative  gradients  with  respect  to  the 
c link  weights 
c 

c first  calculate  negative  gradient  of  error 
c with  respect  to  input  weights 
c 

do  23125  h = 1,  nhid 

do  23127  i = 1,  ninp+1 

gml(h,i)  = sdot(npats,  deltal(l,h),  1,  vinp(l,i),  1)  / 
& div 

23127  continue 

23125  continue 

c 

c finally  with  respect  to  output  weights 
c 

do  23129  j = 1,  nout 

do  23131  h = 1,  nhid+1 

gm2(j,h)  = sdot(npats,  delta2(l,j),  1,  vhid(l,h),  1)  / 
& div 

23131  continue 

23129  continue 

23123  continue 
return 
end 

c********************************************************************** 

c 

c putwts 

c 

C-k1ci(1e'k1e1c1c1c1c1e‘k-kii1c-k1c1c'k-k1c1ci:1c'k'k1c1cic1e1cic1cic1c1c±ieicieiciti:1cicieic1c‘kie1cic"k1(icii1c1iiiie'kieicicicicic’kic'ic 

subroutine  putwts (fp,fpgrwl,fpgrw2,  wl,  w2,ninp, nhid, nout) 

qIs  "ki:  Ic  "k  ic  1c  Iclcic  1e  "k  Icic  1c  ^lieic'kleie’k  Icicle  •kiefc'kic1e1citic'kicrkieic-k1c1cic'k'kic1c"kicic’kicicicicic"kic^:ic"k'kieieic  4c 
C 

c Write  the  weights  to  file  with  unit  number  fp. 

c Weights  for  each  node  start  on  a new  line, 

c 

c Write  weights  to  graphics  file 

c 

Qk  k k k k k k kkic  kiclcicicieicic  it  kiciciciciclclcicicicicicicicicicicicicicicicic  kick  kicicicicic  kick  kicicicicicicicicicicicicicic 

integer  f p , f pgrwl , f pgrw2 
real  wl(nhid,  ninp+1) 
real  w2(nout,  nhid+1) 
integer  h,  i,  j 

write(fp,  998)  ninp,  nhid,  nout 
do  23133  h = 1,  nhid 
write(fp,  999)  (wl(h,i),  i = 1,  ninp+1) 

23133  continue 

do  23135  j = 1,  nout 
write(fp,  999)  (w2(j,h),  h = 1,  nhid+1) 

23135  continue 

noutpl  = nout  + 1 
nipl  = ninp  + 1 
nip2  = ninp  + 2 
nhpl  = nhid  + l 
izero  = 0 
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write (fpgrwl, 9000)  nip2,nhpl 
9000  format (i6, lx, i6) 

write (fpgrwl, 9010)  izero, (i,i=l,nhid) 

9010  format (i6, lOOilS) 

do  23200  i = l,nipl 

write (fpgrwl, 9020)  i, (wl(h,i) ,h=l,nhid) 

9020  format(i6,100el5.6) 

23200  continue 

write(fpgrw2,9000)  nhpl,noutpl 
do  23250  h = l,nhpl 

write(fpgrw2,9030)  h,w2(l,h) 

9030  format(i6,el5.6) 

23250  continue 

return 

998  format(3i5) 

999  format (lx,5el4. 6) 

end 

c************* ********************************************************* 

c 

c endopt 

c 

0********************************************************************** 
subroutine  endopt (fprxin,  iter,  ncalls,  ierr,  err,  gw) 

c*********************************************************** *********** 

c 

c Do  some  output  at  the  end  of  the  optimization 

c 

0************ ************************************************ ********** 
integer  fprun 
c 

c if  ierr  is  <=  0 signal  error  goal  achieved 
c 

assign  900  to  ifmt 
c 

c if  ierr  = 1 then  hit  iteration  limit  without 
c convergence  within  specified  goals 
c 

if (ierr  .eg.  1)  then 
assign  901  to  ifmt 
c 

c if  ierr  = 2 then  run  termination  due  to  small  gradient 
c 

else  if (ierr  .eq.  2)  then 
assign  902  to  ifmt 
c 

c if  ierr  = 3 then  error  return  due  to  slow  convergence 
c 

else  if (ierr  .eq.  3)  then 
assign  903  to  ifmt 
end  if 
c 

write (*,  ifmt)  iter,  ierr 
write (fprun,  ifmt)  iter,  ierr 
write (*,  989)  iter,  ncalls,  err,  gw 
write (fprun,  989)  iter,  ncalls,  err,  gw 
c 

return 

c 

900  formate  Iter*,  i6,  •;  ierr  *,  il,  * : achieved  error  goal*) 

901  formate  Iter*,  i6,  *;  ierr  *,  il,  * : iteration  limit*) 
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: gradient  small  * ) 


\ i6,  * function  calls;  Err 


902  format(*  Iter*,  i6,  ierr  *,  ii, 

903  format(*  Iter*,  i6,  *;  ierr  *,  ii, 

& * : slow  convergence  of  error  * ) 

989  formate*  Used*,  i6,  * iterations; 

& f6.3,*;  |g|/|w|  *,  lpe9.3) 

end 

Q-kic'kic-kic-klcieieiticIt’k’k’k’k-klcic’kliicic-kiciciciciticieicicieic’k-k’kieic-k-krk’k-k-k-kic-kic-kicieidc’k-kic-k’kic’kicie’k'k-k-kie 

C 

c uni 

C 

C******  ****************  *i:*f:ic**ie*-kic*ic***ik*ic1c***-k**ic******  *******  ******** 

real  function  uni(jd) 

c************************************** ******************************** 

c***revision  date  810915 
c***category  no.  g4a23 

random  numbers,  uniform  random  numbers 
blue,  jauaes,  scientific  computing  division,  nbs 
kahaner,  david,  scientific  computing  division,  nbs 
marsaglia,  george,  computer  science  dept. , wash  state  univ 


this  routine  generates  quasi  uniform  random  numbers  on  [0,1) 
and  can  be  used  on  any  computer  with  at  least  16  bit  integers, 
e.g.,  with  a largest  integer  at  least  32767. 


use 


c***keywords 
c***author 
c 
c 
c 

c***purpose 
c 
c 

c*  * *descr iption 
c 

c this  routine  generates  quasi  uniform  random  numbers  on  the  interval 

c [0,1).  it  can  be  used  with  any  computer  with  at  least  16  bit 

c integers,  e.g.,  with  a largest  integer  at  least  equal  to  32767. 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 


first  time.... 

2 = uni(jd) 

here  jd  is  any  non-zero  integer, 
this  causes  initialization  of  the  progreim 
and  the  first  random  number  to  be  returned  as  z. 
subsequent  times . . . 

z = uni(O) 

causes  the  ne^ct  random  number  to  be  returned  as  z. 
machine  dependencies . . . 

mdig  *=  a lower  bound  on  the  nximber  of  binary  digits  availcible 
for  representing  integers,  this  value  is  defaulted 
internally  to  16,  but  may  be  increased  in  line  with 
remark  a below. 

remarks ... 

a.  this  program  can  be  used  in  two  ways: 

(1)  to  obtain  repeatable  results  on  different  computers, 
set  *mdig*  to  the  smallest  of  its  values  on  each,  or, 

(2)  to  allow  the  longest  sequence  of  random  numbers  to  be 
generated  without  cycling  (repeating)  set  *mdig*  to  the 
largest  possible  value. 

b.  the  sequence  of  numbers  generated  depends  on  the  initial 

input  *jd*  as  well  as  the  value  of  *mdig*. 
if  mdig=16  (the  default)  one  should  find  that 
the  first  evaluation 

2=uni(305)  gives  2=. 027832881. . . 
the  second  evaluation 

2=uni(0)  gives  z=. 56102176. . . 
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c the  third  evaluation 

c z=uni(0)  gives  z=. 41456343 , 

c the  thousandth  evaluation 

c z=uni(0)  gives  z=. 19797357 .. . 

c 

c***references  (none) 
c***routines  called  (none) 
c***end  prologue  uni 
c 

Qicicicicicicicielc'kicicicicicicicicicicic'k-k-kleieicic-kic’kic'kicrk’kicie'kicicicic'k’kicic'k’kiciticicic’k’k'k-krk'krk’kicic’k-kicic’kicic-kic-k-k-k-k 

C 

integer  in(17) 
c 

save  i,  j ,in,iiil,in2 
c 

c note...  if  a fortran  77  compiler  is  not  availcdjle,  the  preceding 
c save  statement  should  be  removed, 

c 

comment  data  mdig  / 16  / 

data  mdig  / 32  / 

data  m(l) ,m(2) ,m(3) ,m(4) ,m(5) ,m(6) ,m(7)  ,m(8)  ,m(9)  ,m(10) ,m(ll) , 

1 m(12) ,m(13) ,m(14) ,m(15) ,m(16) ,m(17) 

2 / 30788,23052,2053,19346,10646,19427,23975, 

3 19049,10949,19693,29746,26748,2796,23890, 

4 29168,31924,16499  / 
data  ml,m2,i,j  / 32767,256,5,17  / 

c***first  executable  statement  uni 
if(jd  .eg.  0)  go  to  3 

c fill 

ml  = 2**(mdig-2)  + (2** (mdig-2) -1) 
m2  = 2**(mdig/2) 
jseed  = min0(iabs(jd) ,ml) 
if ( mod(jseed,2) .eq.O  ) jseed=j seed-1 
kO  =mod(9069,m2) 
kl  = 9069/m2 
jo  = mod  (jseed,  m2) 
jl  = jseed/m2 
do  2 i=l,17 
jseed  = j0*k0 

jl  = mod( jseed/m2+j0*kl+jl*k0,m2/2) 
jo  = mod (jseed, m2) 

2 m(i)  = j0+m2*jl 
i=5 

j=17 

c begin  main  loop  here 

3 k=m(i)-m(j) 

if(k  .It.  0)  k=k+ml 

m(j)=k 

i=i-l 

if(i  .eg.  0)  i=17 
3=3-1 

if(j  .eg.  0)  j=17 
uni=f loat (k) /float (ml) 
return 
end 

0 4r  4r  4:  ir  4r  4r  4r  4r  4r  4: 4;  4r  4r  4r  4;  A ir  4r  4:  i;  4r  4;  4r  4: 4r  4;  4: 4;  4r  4r  * 4: 4;  4r  4:  * 4: 4: 4;  4: 4: 4;  4;  4: 4: 4;  4: 4;  4: 4r  4;  4r  4r  ir  ir  4: 4r  4; 

C 

c sdot 

C 

C 4r  4: 4: 4;  4r  4r  4: 4: 4: 4;  4r  4r  4r  4r  4r  4: 4r  4r  4r  4r  4r  4r  4r  4r  4: 4r  4r  4r  4: 4: 4r  4r  4r  4;  4r  4r  4r  4r  4r  4r  4;  4r  4c  4;  4r  4r  4;  4r  4r  4r  4r  4r  4: 4c  4r  4r  4r  4: 4r  4r  4r  4r  4r  4;  4r  4r  4r  4r  4; 
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REAL  FUNCTION  SDOT(N,SX, INCX^SY, INCY) 

C**  ******•!:  ************  ************************************************* 

C 

c To  compute  the  inner  product  between  vectors  sx  and  sy  with 
c separate  increments.  This  routine  from  the  BLAS  library, 
c 

C********************************************************************** 

REAL  SX(*) ,SY(*) 

C***FIRST  EXECUTABLE  STATEMENT  SDOT 
SDOT  = O.OEO 
IF(N.LE.O) RETURN 

IF ( INCX . EQ . INCY ) IF ( INCX-1 ) 5,20,60 
5 CONTINUE 
C 

C CODE  FOR  UNEQUAL  INCREMENTS  OR  NONPOSITIVE  INCREMENTS. 

C 

IX  = 1 
lY  = 1 

IF(INCX.LT.O)IX  = (-N+1)*INCX  + 1 
IF(INCY.LT.0)IY  = (-N+1)*INCY  + 1 
DO  10  I = 1,N 

SDOT  = SDOT  + SX(IX)*SY(IY) 

IX  = IX  + INCX 
lY  = lY  + INCY 
10  CONTINUE 
RETURN 
C 

C CODE  FOR  BOTH  INCREMENTS  EQUAL  TO  1 

C 

C 

C CLEAN-UP  LOOP  SO  REMAINING  VECTOR  LENGTH  IS  A MULTIPLE  OF  5. 

C 

20  M = MOD(N,5) 

IF(  M .EQ.  0 ) GO  TO  40 

DO  30  I = 1,M 

SDOT  = SDOT  + SX(I)*SY(I) 

30  CONTINUE 

IF(  N .LT.  5 ) RETURN 
40  MPl  = M + 1 

DO  50  I = MP1,N,5 

SDOT  = SDOT  + SX(I)*SY(I)  + SX(I  + 1)*SY(I  + 1)  + 

1 SX(I  + 2)*Sy(I  + 2)  + SX(I  + 3)*SY(I  + 3)  + SX(I  + 4)*SY(I  + 4) 
50  CONTINUE 
RETURN 
C 

C CODE  FOR  POSITIVE  EQUAL  INCREMENTS  .NE.l. 

C 

60  CONTINUE 

NS=N*INCX 

DO  70  1=1, NS, INCX 

SDOT  = SDOT  + SX(I)*SY(I) 

70  CONTINUE 
RETURN 
END 

Cifk*************** ***************************************************** 
C 

c snrm2 

c 

c* ***************************************************************  ****** 
REAL  FUNCTION  SNRM2 (N,SX,INCX) 
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oo  o onoo  nnnn  non  on  non 


0********************************************************************** 

c 

c This  function  computes  the  norm  of  the  vector  sx  with  increment 
c allowed.  The  routine  is  from  the  BIAS  library, 
c 

c********************************************************************** 
INTEGER  NEXT 

REAL  SX(*),  CUTLO,  CUTHI,  HITEST,  SUM,  XMAX,  ZERO,  ONE 
DATA  ZERO,  ONE  /O.OEO,  l.OEO/ 

C 

DATA  CUTLO,  CUTHI  / 4.441E-16,  1.304E19  / 

C***FIRST  EXECUTABLE  STATEMENT  SNRM2 
IF(N  .GT.  0)  GO  TO  10 
SNRM2  = ZERO 
GO  TO  300 
C 

10  ASSIGN  30  TO  NEXT 
SUM  = ZERO 
NN  = N * INCX 

C BEGIN  MAIN  LOOP 

1 = 1 

20  GO  TO  NEXT, (30,  50,  70,  110) 

30  IF(  ABS(SX(I))  .GT.  CUTLO)  GO  TO  85 
ASSIGN  50  TO  NEXT 
XMAX  = ZERO 

PHASE  1.  SUM  IS  ZERO 

50  IF(  SX(I)  .EQ.  ZERO)  GO  TO  200 

IF(  ABS(SX(I))  .GT.  CUTLO)  GO  TO  85 

PREPARE  FOR  PHASE  2. 

ASSIGN  70  TO  NEXT 
GO  TO  105 

PREPARE  FOR  PHASE  4. 

100  I = J 

ASSIGN  110  TO  NEXT 
SUM  = (SUM  / SX(I))  / SX(I) 

105  XMAX  = ABS(SX(I)) 

GO  TO  115 

PHASE  2.  SUM  IS  SMALL. 

SCALE  TO  AVOID  DESTRUCTIVE  UNDERFLOW. 

70  IF(  ABS(SX(I))  .GT.  CUTLO  ) GO  TO  75 

COMMON  CODE  FOR  PHASES  2 AND  4. 

IN  PHASE  4 SUM  IS  LARGE.  SCALE  TO  AVOID  OVERFLOW. 

110  IF(  ABS(SX(I))  .LE.  XMAX  ) GO  TO  115 

SUM  = ONE  + SUM  * (XMAX  / SX(I))**2 
XMAX  = ABS(SX(I)) 

GO  TO  200 

115  SUM  = SUM  + (SX(I)/XMAX)**2 
GO  TO  200 
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C PREPARE  FOR  PHASE  3. 

C 

75  SUM  = (SUM  * XMAX)  * XMAX 
C 

c 

C FOR  REAL  OR  D.P.  SET  HITEST  = CUTHI/N 

C FOR  COMPLEX  SET  HITEST  = CUTHI/(2*N) 

C 

85  HITEST  = CUTHI/FLOAT(  N ) 

C 

C PHASE  3.  SUM  IS  MID-RANGE.  NO  SCALING. 

C 

DO  95  J =I,NN,INCX 

IF(ABS(SX(J))  .GE.  HITEST)  GO  TO  100 
95  SUM  = SUM  + SX(J)**2 
SNRM2  = SQRT(  SUM  ) 

GO  TO  300 
C 


200  CONTINUE 

1=1+  INCX 

IF  ( I .LE.  NN  ) GO  TO  20 
C 

C END  OF  MAIN  LOOP. 

C 

C COMPUTE  SQUARE  ROOT  AND  ADJUST  FOR  SCALING. 

C 


SNRM2  = XMAX  * SQRT(SUM) 

300  CONTINUE 
RETURN 
END 

c********************************************************************** 


c 

c optwts 

c 

Cle1c1e'1c1e1f1c1e1i1c'k1e-ki:i:1e1c±1:1:'k1c1e1c1:'k1c1fk±1c1c’k1e1c'k1c1c1e1c1c1c’k'kit1eie1c1i1e1c*1:ic’k1c1e1e'k1cici:i:icrkieiciticic 


subroutine  optwts (itermax,  num,  w,  rmserr,  gw,  iter,  ierr,nuial, 

1 nu2n2 , wf  actor , vinp , vout , target , npats , ninp , nhid , 

2 nout , maxpat , maxins , maxhid , meixout , naxwsize , 

3 ncalls,gin,wnew,p,r,s,vhid,deltal,delta2, 

4 hderiv,  oderiv,  fpscr , fpgrph,  fprun,nf  req, 

& errdel , egoal , gwgoal , iwrt ) 

c********************************************************************** 


c 

c Solve  neural  net  least  squares  problem  by  scaled  conjugate  gradients, 

c Return  weight  vector,  error,  |g|/|w|. 

c Stop  if  emy  of  the  following  is  true  (return  value  as  ierr) 

c 0)  rmserr  <=  egoal 

c 1)  Used  itermax  iterations. 

c 2)  Size  of  gradient  vector  < GWRATIO  * size  of  weight  vector 

c 3)  Error  hasn't  gone  down  by  EFACTOR  in  nfreq  iterations 

c 

Clc1e1c1e1cic1t1c1t1e’k1cii1e'k1c1c1e1c1c1e1c1c1c1eieic1e1eicicicii4:1e1e1e1cie1c1e1f1t±1cicickiei:1c’k1cit1e'k1f'ki:1c1eit1cieicieic'ki:1c 

c the  weights 

real  w(num) 
c 

real  vinp (maxpat , maxins+l ) 
real  vout ( maxpat, maocout) 
real  target  (maxpat , maocout ) 
real  vhid (maxpat , maxhid+1 ) 
real  del ta2  (maxpat , maocout ) 


116 


real  deltal (roaxpat^maxhid) 
real  hderiv(iaaxhid) 
real  oderiv(inaxout) 
c negative  gradients  of  error  wrt  w*s 
real  gin(inaxwsize) 

c new  weights  for  temporary  storage 
real  wnew(maxwsize) 
c direction  vector 

real  p(maxwsize) 
c remembered  negative  gradient 
real  r(maxwsize) 

c second  derivative  info  along  p direction 
real  s(maxwsize) 

c number  of  steps  since  last  restart 
integer  icount 

c number  of  consecutive  failures 
integer  fcovuit 
integer  fpscr,  fpgrph,  fprxin 
logical  success 
c 

c starting  value  for  xl  (or  lambda) . This  is  lambda_l. 
c 

parameter  (xl start  = 0.01) 
c 

c************************************************************* 

c 

c *******  INITIALIZATION  SECTION  ******** 

c 

c************************************************************ 

c 

c test  whether  the  maximum  number  of  iterations  is  positive, 
c If  not  exit  with  the  computed  function  value  (error)  without  the 
c negative  gradient,  itermax  is  set  to  0 when  running  tests, 
c 

if  (itermax  .gt.  0)  then 
c 
c 
c 

c When  itermaoc  > 0: 

c Get  initial  function  value  and  negative  gradient 
c Compute  initial  rms  error  and  zero  iteration  counter 
c Set  counters  and  set  initial  search  direction 
c 
c 
c 

write (fprun,  999)  itermax,  num 

999  formate  iterations  = *,i6,*  # of  weights  = */i6) 

c 

c**************************************************************** 

c 

c **********  get  initial  function  and  negative  gradient  ********* 
c 

c********** ********************************************* ********* 

c 

c get  inital  error  and  negative  gradient 
c 

call  func (itermax  .gt.  0,  niam,  w,  error,  gm,numl,num2, 

1 wf actor , vinp , vout , target , npats , ninp , nhid , nout , 

2 maxpat , maxins , maxhid , maxout , ncalls , vhid , 

3 deltal , delta2 , hderiv , Oder  iv , iwrt , f prun) 
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c 

c get  the  norm  of  the  weight  vector 
c 

wsiz  = snrm2(niim,  w,  i) 
c 

c initialize  the  iteration  counter  if  itermax  > 0 
c 

iter  = 0 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


compute  the  rms  error. 

Note  rms  error  is  the  sqrt  of  the  sums  of  square  differences 

with  no  leading  1/2.  error  has  a leading  1/2  that  must  be  canceled 

by  the  2 multiplier.  Note  that  error  includes  division  by 

the  product  of  the  number  of  patterns  and  the  number  of 

outputs  when  passed  back  from  func.  Therefore  the  next  line  forms  a 

legitimate  rms  error  by  computing  the  norm  of  a vector  and  dividing 

by  the  square  root  of  its  length. 

rmserr  = sqrt (error  * 2) 

initialize  error  flag  to  indicate  exit  with  iteration  count 
larger  than  itermax.  This  is  the  default  setting. 

ierr  = 1 

initialize  relative  distance  for  numerical  derivative 
sigma  = l.e-4 

initialize  lambda  to  lambda__l  > 0 
xl  = xlstart 

initailize  lambda_l  bar  to  0 
xlb  = 0 

initialize  positive  definiteness  parameter  to  0 
deltak  = 0 

initialize  logical  flag  identifying  that  a reduction  in 
error  function  can  be  made 

success  = .true. 

save  the  current  rms  error 

rmsold  = rmserr 

identify  how  often  to  restart  the  algorithm.  The  conjugate 
gradient  algorithms  must  be  restarted  after  the  total  number 
of  iterations  becomes  a multiple  of  the  number  of  unknowns. 

This  is  because  the  function  being  minimized  is  not  necessarily 
quadratic. 

iover  = num 

initialize  the  number  of  iterations  since  last  restart.  This 
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c counter  gets  reset  after  each  restart.  It  is  used  to  test 
c against  iover. 
c 

icount  = 0 
c 

c initialize  the  number  of  failed  iterations  in  a row.  An  iteration 
c is  considered  a failure  if  it  does  not  reduce  the  error  function, 
c This  counter  will  be  used  as  a termination  check.  The  s\ibroutine 
c will  stop  on  too  many  failures  in  a row. 
c 

fcount  = 0 
c 

c initialize  the  number  of  iterations  since  last  convergence  check 
c 

ncount  = 0 
c 

c save  the  max  iterations 
c 

iter  = itermax 
c 

c 

C ********  get  initial  search  direction******** 

c 

c********************************************************* 

c 

c initialize  both  the  step  direction  and  current  steepest 
c descent  array  to  the  initial  steepest  descent 
c 

do  23006  n = 1,  num 
p(n)  = gm(n) 
r(n)  = gin(n) 

23006  continue 

c 

0******************************************************** 

c 

c ********  initialize  iteration  counter  ********* 
c 

c******************************************************** 

c 

c initialize  the  iteration  counter.  This  index  continues  to 
c accumulate  no  matter  how  many  restarts, 
c 

k = 0 

write (*,9000) 

9000  format(*  Iteration,  rmserr*) 

else 
c 

c************* ********************************** ********* 

c 

C ********  RUNNING  TESTS  ONLY  ********* 
c 

c* ********************************************** ********* 

c 

c When  itermax  <=  0: 

c Get  initial  function  value  and  rms  error 
c Set  return  flag 
c 
c 
c 
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c get  inital  error  only 
c 

call  func(itenQax  .gt.  0,  num,  w,  error,  gm,niml,num2, 

1 wf  actor , vinp , vout , target , npats , ninp , nhid , nout , 

2 maxpat , maxins , maxhid , maxout , ncalls , vhid , 

3 deltal , delta2 , hderiv , oderiv , iwrt , f prun) 
c 

c get  the  norm  of  the  weight  vector 
c 

wsiz  = snrm2  (niim,  w,  1) 
c 

c set  error  index  if  itermax  <=  0 then  exit  with  rms  error 
c 

iter  = -1 
c 

c compute  the  rms  error  no  matter  what  itermax  is 
c 

rmserr  = sqrt (error  * 2) 
c 

c return  if  itermax  <=  0,  This  would  happen  on  testing  rxins 
c and  not  training  runs 
c 

return 
end  if 
c 

c* ***************************** ****************************** 
c 

C ******  main  iteration  loop  ************* 

c 

c*********** ********************************* **************** 
c 

c Entry  for  the  main  iteration  loop 

c if  the  iteration  counter  hits  the  maximum  iterations  then  exit  the 
c routine 
c 

23008  if(k  .ge.  itermax)  then 
goto  23009 
end  if 
c 

c increment  the  counter  for  the  number  of  iterations  since 
c the  last  restart 
c 

icount  = icount  + 1 
c 

c get  the  norm  of  the  current  direction  vector 
c 

psiz  = snrm2(num,  p,  1) 
c 

c square  this  norm 
c 

psq  = psiz**2 
c 

c if  a reduction  in  the  error  can  be  made  compute  the 
c quadratic  terms  otherwise  only  adjust  parameters  with  current 
c quadratic  information  by  skipping  to  step  3 of  the  algorithm, 
c A reduction  in  error  cannot  be  made  if  the  comparison 
c delta  is  < 0.  If  so  try  scaling  the  current  Hessian  again,  get 
c a new  step  size  and  recompute  the  comparison  pareuneter  delta, 
c 

if (success  .eg.  .true.)  then 
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C*********  ***************  ************icit*****i:icic***  ********* 

C 

c********  GET  SECOND  ORDER  DIRECTIONAL  DERIVATIVE  ********* 
c 

Q********  *********************  *icic*****  ********************* 

C 

c a reduction  in  the . error  function  can  be  made 
c get  divisor  for  approximate  second  derivative  info 
c 

sigmak  = sigma  * wsiz  / psiz 
do  23012  n = 1,  num 

wnew(n)  = w(n)  + sigmak  * p(n) 

23012  continue 

c 

c get  error  and  negative  gradient  at  the  new  weight  vector  point 
c 

call  func(,true.,  num,  wnew,  enew,  s,numl,num2,wf actor, 

1 vinp , vout , target , npats , ninp , nhid , nout , 

2 maxpat , maxins , maxhid , maxout , ncalls , vhid , 

3 deltal , delta2 , hderiv , oderiv , iwrt , f prun) 
c 

c Compute  the  second  order  directional  derivative  along  p 
c 

do  23014  n = 1,  num 

s(n)  = (gm(n)  - s(n))  / sigmak 
23014  continue 

c 

c compute  definiteness  check  pareimeter 
c 

deltak  = sdot(num,  s,  1,  p,  1) 
end  if 
c 

c************** *************************************  ******** 

c 

C *********  make  hessian  POSITIVE  DEFINITE  *********** 
c 

c*** ************************************************* ******* 

c 

c section  to  scale  approximate  Hessian 
c and  definiteness  parcuneter 
c 

c skip  to  this  section  if  a reduction 
c in  error  could  not  be  previously  done, 
c Attempt  to  scale  and  try  stepping  again, 
c The  subroutine  will  terminate  if  too  many  attempts 
c in  a row  are  made  to  try  to  reduce  the  error  and 
c fail, 
c 

c save  Iconbdak  - lambdak_bar 
c 

c = xl  - xlb 
c 

c if  it  is  0 skip  the  scaling 
c 

if(c  .ne.  0.0)  then 
c 

c scale  the  approximate  Hessieui  and  the  definiteness  parameter 
c 

do  23018  n = 1,  num 
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23018 


s(n)  = s(n)  + c * p(n) 
continue 

deltak  = deltak  + c * psq 
end  if 
c 
c 
c 

c test  for  the  definiteness  of  the  Hessian 
c if  it  is  not  positive  definite  (deltak  <=0)  scale  the 
c second  order  term  to  make  it  positive  definite 
c before  taking  a new  step. 

c If  it  is  positive  definite  skip  to  get  a new 
c step  size 
c 
c 
c 

if (deltak  .le.  0)  then 

c = xl  - 2 * deltak  / psq 
do  23022  n = 1,  num 

s(n)  = s(n)  + c * p(n) 

23022  continue 

xlb  = 2 * (xl  - deltak  / psq) 
deltak  = - deltak  + xl  * psq 
xl  = xlb 
end  if 
c 

c****************** *************************** ************ 
c 

c ********  COMPUTE  NEW  STEP  LENGTH  ********** 
c 

c**** ************************************** *************** 
c 

xmu  =sdot(num,  p,  1,  r,  1) 
alpha  = xmu  / deltak 
c 

0****** ********4:* *****************************  *********** 

C 

C *********  COMPUTE  NEW  WEIGHT  VECTOR  ************ 
c 

C******************************************************** 

c 

do  23024  n = 1,  num 

wnew(n)  = w(n)  + alpha  * p(n) 

23024  continue 

c 

c******************** *********************** ************ 

c 

C ********  COMPUTE  QUADRATIC  TEST  PARAMETER  ******** 
c 

C*  ********  4r*  ****  4:  ****4;'*4(**4r***  *4;*  ********  *4;*  *********** 

C 

c get  new  function  value  and  negative  gradient 
c compute  quadratic  approximation  test  parameter 
c A reduction  in  error  can  be  made  if  delta  >=0 
c otherwise  delta  < 0 means  a reduction  could  not 
c be  made, 
c 

call  func(.true.,  num,  wnew,  enew,  gm,numl,num2, 

1 wf  actor , vinp , vout , target , npats , ninp , 

2 nhid , nout , maxpat , maxins , maxhid , maxout , 
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3 ncalls , vhid , deltal , delta2 , hderiv , oderiv , iwrt , f prun) 

delta  = 2 * deltak  * (error  — enew)  / xmu**2 
c 

c if  a reduction  in  error  can  be  made  (delta  >=  0)  get  a new 
c conjugate  direction,  otherwise  get  a count  of  the  failures, 
c 

if (delta  .ge.  0.0)  then 
c 

c************************************************************ 

c 

c ********  SETUP  FOR  NEW  CONJUGATE  DIRECTION  ********** 

C ***  OTHERWISE  TEST  FOR  TOO  MANY  UNSUCCESSFUL  STEPS  ** 
c 

c* ******************************************** *************** 

c 

c a reduction  in  error  was  made 
c set  up  new  conjugate  direction  and 
c adjust  parcuneters  to  maintain  a 
c trustworthy  quadratic  approximation 
c 

c iteration  counter  czm  be  updated  as  well  as  the  count  of  the  number 
c of  iterations  since  the  last  convergence  check.  Since  this  is  a 
c successful  iteration  the  number  of  failed  iterations  in  a row  can 
c be  reset  to  0 
c 

k = k + 1 

write ( fpscr, *)  k,rmserr 

if  (mod(k,nfreq)  .eq.  0)  write(*,*)  k,rmserr 
ncount  = ncount  + 1 
fcount  = 0 
c 

c** ******************************************* ************** 

c 

c **********  UPDATE  THE  WEIGHT  VECTOR  ************ 
c 

^**********************************************************‘4: 

C 

do  23028  n = 1,  num 
w(n)  = wnew(n) 

23028  continue 

c 

c compute  its  norm 
c 

wsiz  = snrm2(num,  w,  1) 
c 

c update  the  current  error  function  value 
c 


error  = enew 
xlb  = 0 

success  = . true . 
c 

0*** ************************************************ ****** 
c 

C **********  TEST  FOR  ALGORITHM  RESTART  ************ 
c 

c*** ************************************************ ****** 

c 

if (mod(icount,  iover)  .eq.  0)then 
c 

c********************************************************* 
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c 

c **********  Qg»p  CONJUGATE  DIRECTION  ************ 

C 

C**** *******4:**** *************************** ************** 

C 

c restart  by  setting  current  direction  to  steepest  descent 
c 

do  23032  n = 1,  num 
p(n)  = gm(n) 

23032  continue 

else 
c 

c if  there  is  no  restart  get  a new  conjugate  direction 
c 

beta  = (sdot(nuiii,  gm,  1,  gm,  1)  - sdot(nuia,  gm,  1,  r,  1)) 

& / xmu 

do  23034  n = 1,  num 

p(n)  = gm(n)  + beta  * p(n) 

23034  continue 

end  if 
c 

c After  getting  new  direction  update  current  steepest  descent 
c 

do  23036  n = 1,  num 
r(n)  = gm(n) 

23036  continue 

c 

c************************************************************ 

c 

c ******  test  quadratic  trustworthiness  *********** 
c 

c************************************************************ 

c 

if (delta  .ge.  0.75)  then 
c 

c quadratic  approximation  is  trustworthy  at  this  point,  can  reduce 
c leunbda  to  increase  step  size 
c 

xl  = xl  / 2 
c 

c test  whether  delta  < 0.25 
c 

c not  nearly  as  good  a quadratic  approximation  therefore  increase  xl  to 
c reduce  the  step  length 
c 

else  if (delta  .It.  0.25)  then 
xl  = 4.0  * xl 
end  if 
else 
c 

c******** **************************************** ************ 
c 

c **********  test  for  too  many  unsuccessful  steps  *********** 

c ***************  COME  HERE  IF  DELTA  < 0 ******************** 
c 

C************************************************************ 

c 

c end  if  delta  < 0 too  many  times 
c 

c unsuccessful  step  and  error  cannot  be  reduced 
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c increment  iteration  failure  counter 
c 

xlb  = xl 

success  = .false, 
fcount  = fcount  + 1 
c 

c Do  not  exit  on  a single  failure  to  reduce  the  error.  Attempt  to 
c scale  another  time.  However  set  two  failures  as  a limit, 
c If  there  are  fewer  than  two  failures  in  a row  then  go  on  to  step  8 
c to  adjust  lambdak  upwards  which  reduces  the  step  size  and  possibly 
c the  local  error 
c 

if (fcount  .gt.  2)  then 
c 

c if  the  number  of  iterations  since  the  last  restart  is  not  greater 
c than  the  iteration  failure  counter  then  exit 
c 

if(icount  .gt.  fcount)  then 
c 

c at  least  one  good  step  since  restart  then  restart  again 
c 

do  23044  n = 1,  num 
p(n)  = gm(n) 

23044  continue 

c 

c initialize  lambda_)c  to  lambda_l 
c 

xl  = xlstart 
c 

c set  l2aiibda__k  bar  back  to  0 
c 

xlb  = 0 

success  = .true, 
delta  = 1.0 
icount  = 0 
fcount  = 0 
c 

c return  with  error  index  if  the  failure  count  is  > 2 and  the 
c number  of  iterations  since  the  last  restart  is  <=  the  failure  count 
c 

else 

ierr  = 3 
iter  = k 
goto  23009 
end  if 
end  if 
end  if 
c 

Qlc’k'k1citfc1tic1cieic1:4:1cic1c1ticicic4c1cicici(icic’kicicTkicicic'kic'ic*ie’k1c‘kicicicicicicic1cie‘k1ci:icic1cic±ic 

C 

c ********  test  for  convergence  ***************** 
c 

c*************************** ********************************* 

c 

c 

c adjust  xl  if  delta  < .25 
c 

if  (delta  .It.  0.25)  then 
xl  = 4.0*xl 

end  if 
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c 

c compute  current  rms  error  and  the  norm  of  the  current  descent 
c direction 
c 

rmserr  = sqrt (error  * 2) 
gsiz  = snrm2 (num,  r,  1) 
c 

c if  the  error  function  can  be  reduced  and  the  convergence  counter 
c is  >=  number  of  iterations  before  convergence  check  then  reset 
c convergence  check  counter  and  test  for  convergence, 
c 

if ((success  .eg.  .true.)  .and.  (ncount  .ge.  nfreq) ) then 
ncount  = 0 
c 

c Terminate  if  convergence  too  slow 
c 

if  (rmserr  .gt.  errdel  * rmsold)  then 
ierr  = 3 
iter  = k 
goto  23009 
end  if 
c 

c save  the  current  rms  error  for  future  check 
c 

rmsold  = rmserr 
end  if 
c 

c Terminate  when  error  satisfactory 
c 

if (rmserr  .It.  egoal)  then 
ierr  = 0 
iter  = k 
goto  23009 
end  if 
c 

c Terminate  when  gradient  is  too  small 
c 

if (gsiz  .It.  gwgoal  * max (1.0,  wsiz))  then 
ierr  = 2 
iter  = k 
goto  23009 
end  if 
c 

Ck  icicle  It  Icicle  icicle  it  1c1e1c1cieieicie1c-k1e1c1e"k1c1c’kic1('iiic-k’k1cicic1cicic-kicici:-kie'k"kic‘kieicic1cie-k-k 
C 

C *******  GO  BACK  TO  TOP  OF  MAIN  LOOP  ******** 
c 

c********************************************************** 

c 

goto  23008 
c 

C**************4r************************* ****************** 

C 

c **********  exit  ********** 

C 

C***************************************** ********** ******** 

c 

c compute  ratio  of  descent  magnitude  over  weight  magnitude 
c 

23009  continue 
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gw  = gsiz  / wsiz 
c 

c rewind  scratch  file  and  write  out  graphics  file 
c 

ncol  = 2 

rewind ( f pscr ) 

write ( f pgrph , * ) iter , ncol 

do  24000  i = loiter 

read (f pscr, *)  it,rer 
write (f pgrph,*)  it,rer 
24000  continue 

return 
end 

c********************************************************** 

c Kolmogorov-Smimov  one-distribution  test 

c********************************************************** 
SUBROUTINE  KSONE (DATA,N, D, PROB) 
c********************************************************** 
c 

c From  Press,  et  al.  "Numerical  Recipes" 
c 

c Given  an  array  of  N values,  DATA,  and  given  a user  supplied 
c function  of  a single  variable,  CUMUL,  which  is  the 
c cumulative  distribution  function  ranging  from  0 (for 
c smallest  values  of  its  argument)  to  1 (for  largest  values 
c of  its  argument) , this  routine  returns  the  K-S  statistic,  D, 
c and  the  significance  level  PROB.  Small  values  of  PROB  show 
c that  the  cumulative  distribution  function  of  DATA  is 
c significantly  different  from  CUMUL.  The  array  DATA  is 
c modified  by  being  sorted  into  ascending  order, 
c 

Q’k'k'k'k'k'kieic'k'kie'k'k'k'k'kit’kit'kit'k'k'kifkitie'kic'kic'kic'k'kic'k-kicic'kit'k'kicicieickifk'k'k'k'k'k 

DIMENSION  DATA(N) 

CALL  SORT (N, DATA) 

EN=N 

D=0. 

F0=0. 

DO  11  J=1,N 
FN=J/EN 

FF=CDMUL(DATA(J) ) 

DT=AMAXl(ABS(FO-FF)  ,ABS(FN-FF) ) 

IF(DT.GT.D)D=DT 

F0=FN 

11  CONTINUE 

PROB=PROBKS (SQRT (EN) *D) 

RETURN 

END 

Q'k'kific'kieicicit'kieicle'k'k’kit'kitle'k'kieic'k’k’k’kifkifkitifk'kieifkitifk’k'k'k’k'kit'kieltit'kic’kit 

FUNCTION  PROBKS(ALAM) 

c******************************************************** 

c 

c From  Press,  et  al.,  "Numerical  Recipes" 

^ c 

c This  function  calculates  the  significance  for  the 
c K-S  statistic 
c 

C'kic-kic-kicle'k-kicleiciciclelelc'kic’kicicicicic'kicie'krkicifkifit'kie'kieicieieic-kie-kli’k’kicicic'k'kieic'k 

PARAMETER  (EPS1=0.001,  EPS2=l.E-8) 

A2=-2 . *ALAM**2 
FAC=2 . 
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PROBKS=0 . 

TERMBF=0 . 

DO  11  J=l,100 

TERM=FAC*EXP (A2*J**2 ) 

PROBKS=PROBKS+TERM 

IF(ABS(TERM)  . LT.EPSl*TERMBF.OR.  ABS  (TERM)  . LT. EPS 2* PRO BKS)  RETURN 

FAC=-FAC 

TERMBF=ABS  (TERM) 

11  CONTINUE 
PROBKS=l . 

RETURN 

END 

C********************************************************* 

FUNCTION  ERFCC(X) 

0********************************** ***************** ****** 
c 

c From  Press,  et  al.,  "Numerical  Recipes” 
c 

c Returns  the  complimentary  error  function  with 
c fractional  error  less  than  1.2e-7 
c 

0********************************************************* 

Z=ABS(X) 

T=1./(1«+0*5*Z) 

ERFCC=T*EXP(-Z*Z-1. 26551223+T* (1. 00002368+T* ( . 37409196+ 

* T* ( . 09678418+T* (-. 18628806+T* ( . 27886807+T* (-1.13520398+ 

* T*(1.48851587+T*(-.82215223+T*. 17087277)  )))))))) 

IF  (X.LT.O.)  ERFCC=2.-ERFCC 

RETURN 

END 

0********************************************************* 

SUBROUTINE  SORT(N,RA) 

0************** ********************************** ********** 
c 

c From  Press,  et  al.,  "Numerical  Recipes" 
c 

c Sorts  an  array,  RA,  of  length  N into  ascending  numerical 
c order  using  the  Heapsort  algorithm.  N is  input;  RA  is 
c replaced  on  output  by  its  sorted  rearrangement, 
c 

0******************** ************************************** 

DIMENSION  RA(N) 

L=N/2+l 

IR=N 

10  CONTINUE 

IF(L.GT.1)THEN 

L=L-1 

RRA=RA(L) 

ELSE 

RRA=RA(IR) 

RA(IR)=RA(1) 

IR=IR-1 

IF(IR.EQ.1)THEN 

RA(1)=RRA 

RETURN 

ENDIF 

ENDIF 

I=L 

J=ia-L 

20  IF(J.LE.IR)THEN 
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IF(J.LT.IR)THEN 

IF(RA(J)  .LT.RA(J+1)  ) J=J+1 
ENDIF 

IF (RRA. LT . RA ( J) ) THEN 
RA(I)=RA(J) 

I=J 
J=J+J 
ELSE 
J=IR+1 
ENDIF 
GO  TO  20 
ENDIF 
RA(I)=RRA 
GO  TO  10 
END 

function  erf(x) 

Ck 'k'k-k’k’kiclciclc’klcle  icicle  it  leieic'k’k’k-k'kic'k-kicieieic'kic'k-kicic’kic'kicicicicicic'kic’kic’k-kicicicic 
C 

c This  function  computes  the  error  function 
c 

Qicieieieicicicicicleicicicicleicieleicieieicicieieicieicieieicieieicieieicicicicicleieicicieieicicicicicicieieiei: 

erf  =1.  - erfcc(x) 

return 

end 

Ckicic  icicle  a icicicicielcicicicicicleicieicicicieicicicieicieicicicieleicicicicieicicielcicicicicicicicicicicic 

function  cumul(x) 

Qieicieieieieicieieicieieieieieieieicieieieicicieieieieicieieieieieieieieieieieieieicieieieieieieieicicieieieieie 

C 

c This  function  returns  the  cumulative  distribution  of 
c the  (0,1)  normal  distribution 
c 

Qieieicicieieieicieieieicicieicieieieieicicicicieicicicicicicicieicicicicieieicieicieicieieicicieieicicicieieieic 

if  (x  .ge.  0.)  then 

cumul  = 0.5* (l.+erf (x/ (sqrt(2. ) ) ) ) 

else 

cumul  = 1.  - 0.5* (l.+erf (-x/(sqrt (2. ))) ) 

endif 

return 

end 


129 


APPENDIX  B 

SAMPLE  OUTPUT  FILE 
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X * 


NEURAL  NET  RUN  REPORT 

****************’****it***********^******rfc’**’**********************************^**.* 

***★•*•*★********★***★**•****************★******************'********************* 


DATA  SET 


******************************************************************************* 


Training  on  xdspdown.tm 


The  number  of  data  patterns  is  397 


******************************************************************************** 


NETWORK  SPECIFICATION 


**********it******************************** ************************************ 


INPUT  NODES 

HIDDEN  NODES 

OUTPUT  NODES 

37 

60 

1 

WEIGHTS  (I  TO  H) 

WEIGHTS  (H  TO  0) 

TOTAL  WEIGHTi 

2280 

61 

2341 

******************************************************************************** 


RUN  PARAMETERS 


******************************************************************************* 
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Random  initial  weights,  seed  12345 

Regularization  factor  wf actor  = 1.300E-03 

1 

********************************************************************************** 


RUN  TERMINATION  CRITERIA 


******************************************************************************* 


Maximum  number  of  iterations  is  1500 
Error  checking  frequency  is  every  1 iterations 
Run  is  terminated  on  either  max  iterations  or 
one  of  the  criteria  below. 

RMS  error  <=  O.lOOE-02 

(RMS  of  g)  <=  O.lOOE-11  * (RMS  of  w) 

(RMS  err)  > 1.00  * (RMS  err  1 iterations  ago) 

max  iterations  = 1500  # of  weights  = 2341 

1 

******************************************************************************* 


RUN  RESULTS 


******************************************************************************* 


R-SQ  of  nonlinear  fit  = 0.9961917 


Pointwise  Error 


Mean 

0.2527817E-05 


Standard  Deviation 
0.1701827E-03 


Minimum 

-0.6710461E-03 


Maximum 

0.7603451E-03 


Data 

Index 

1 

2 

3 

4 

5 


Output 

Node 

1 

1 

1 

1 

1 


Desired 

Output 

-0.1163E-02 

-0-3243E-02 

-0.7414E-02 

-0.9268E-02 

-0.1280E-01 


Predicted 

Output 

-0.1325E-02 

-0.3874E-02 

-0.7286E-02 

-0.9583E-02 

-0.1326E-01 


Absolute 

Error 

0.1616E-03 

0.6303E-03 

-0.1279E-03 

0.3158E-03 

0.4662E-03 
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6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 


1 

-0.1458E-01 

-0.1474E-01 

0,1646E-03 

1 

-0.1706E-01 

-0-1717E-01 

0.1092E-03 

1 

-0.1833E-01 

-0.1842E-01 

0.8106E-04 

1 

-0.2032E-01 

-0.2079E-01 

0.4792E-03 

1 

-0.2156E-01 

-0.2180E-01 

0.2378E-03 

1 

-0.2311E-01 

-0.2283E-01 

-0.2781E-03 

1 

-0.2318E-01 

-0.2311E-01 

-0.7785E-04 

1 

-0.4257E-02 

-0.3586E-02 

-0.6710E-03 

1 

-0.6372E-02 

-0.6389E-02 

0.1641E-04 

1 

-0.1008E-01 

-0.9664E-02 

-0.4157E-03 

1 

-0.1155E-01 

-0.1149E-01 

-0.6624E*04 

1 

-0.1518E-01 

-0.1489E-01 

-0.2914E-03 

1 

-0.1655E-01 

-0.1609E-01 

-0.4650E-03 

1 

-0.1884E-01 

-0,1832E-01 

-0.5189E-03 

1 

-0,1973E-01 

-0,1938E-01 

-0.3496E-03 

1 

-0.2196E-01 

-0.2135E-01 

-0.6079E-03 

1 

-0.2282E-01 

-0.2230E-01 

-0.5200E-03 

1 

-0.2343E-01 

-0.2282E-01 

-0.6117E-03 

1 

-0.1175E-02 

-0.8425E-03 

-0.3325E-03 

1 

-0.5328E-02 

-0.5506E-02 

0.1773E-03 

1 

-0.7181E-02 

-0.6983E-02 

-0.1988E-03 

1 

-0.1040E-01 

-0.1081E-01 

0.4007E-03 

1 

-0.1228E-01 

-0.1273E-01 

0.4491E-03 

1 

-0.1525E-01 

-0.1556E-01 

0.3107E-03 

1 

-0.1632E-01 

-0.1659E-01 

0.2763E-03 

1 

-0.1830E-01 

-0.1844E-01 

0.1442E-03 

1 

-0.1923E-01 

-0.1932E-01 

0.8399E-04 

1 

-0.2129E-01 

-0.2151E-01 

0.2196E-03 

1 

-0.2179E-01 

-0.2187E-01 

0.7486E-04 

1 

0.1793E-05 

-0.2798E-03 

0.2816E-03 

1 

-0.2097E-02 

-0.2078E-02 

-0.1882E-04 

1 

-0.6162E-02 

-0.5910E-02 

-0.2519E-03 

1 

-0.7888E-02 

-0.7834E-02 

-0.5306E-04 

1 

-0.1129E-01 

-0.1145E-01 

0.1600E-03 

1 

-0.1296E-01 

-0.1312E-01 

0.1557E-03 

1 

-0.1534E-01 

-0.1548E-01 

0.1386E-03 

1 

-0.1643E-01 

-0.1645E-01 

0.2010E-04 

1 

-0.1831E-01 

-0.1826E-01 

-0.5399E-04 

1 

-0.1953E-01 

-0.1919E-01 

-0.3315E-03 

1 

-0.2092E-01 

-0.2113E-01 

0.2115E-03 

1 

-0.2094E-01 

-0.2108E-01 

0.1425E-03 

1 

-0.1468E-02 

-0.1398E-02 

-0.6996E-04 

1 

-0.3597E-02 

-0.3544E-02 

-0.5235E-04 

1 

-0.7325E-02 

-0.6886E-02 

-0.4393E-03 

1 

-0.8745E-02 

-0.8568E-02 

-0.1776E-03 

1 

-0.1230E-01 

-0.1223E-01 

-0.6508E-04 

1 

-0.1367E-01 

-0.1364E-01 

-0.2880E-04 

1 

-0.1584E-01 

-0.1581E-01 

-0.3006E-04 

1 

-0.1671E-01 

-0.1678E-01 

0.7707E-04 

1 

-0.1888E-01 

-0.1860E-01 

-0.2891E-03 

1 

-0.1975E-01 

-0.2014E-01 

0.3973E-03 

1 

-0.2030E-01 

-0.2040E-01 

0.1018E-03 

1 

0.2488E-02 

0.1828E>02 

0.6604E-03 

1 

-0.1376E-02 

-0.1298E-02 

-0.7775E-04 

1 

-0.3241E-02 

-0.3020E-02 

-0.2215E-03 

1 

-0.6488E-02 

-0.6362E-02 

-0.1268E-03 

1 

-0.8302E-02 

-0.8139E-02 

-0.1638E-03 

1 

-0.1122E-01 

-0.1091E-01 

-0.3138E-03 

1 

-0.1219E-01 

-0.1208E-01 

-0.1158E-03 

1 

-0.1420E-01 

-0.1424E-01 

0.4655E-04 
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66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 

100 

101 

102 

103 

104 

105 

106 

107 

108 

109 

110 

111 

112 

113 

114 

115 

116 

117 

118 

119 

120 

121 

122 

123 

124 

125 


1 

-0.1507E-01 

-0.1533E-01 

0.2595E-03 

1 

-0.1700E-01 

-0.1699E-01 

-0.4636E-05 

1 

-0.1746E-01 

-0.1752E-01 

0.5755E-04 

1 

0.2593E-02 

0.1833E-02 

0.7603E-03 

1 

0.6198E-03 

0.2878E-03 

0.3320E-03 

1 

-0.3264E-02 

-0.3036E-02 

-0.2276E-03 

1 

-0.4950E-02 

-0.4917E-02 

-0.3323E-04 

1 

-0.8222E-02 

-0.8189E-02 

-0.3262E-04 

1 

-0.9858E-02 

-0.9702E-02 

-0.1557E-03 

1 

-0.1209E-01 

-0.1207E-01 

-0.1951E-04 

1 

-0.1316E-01 

-0.1311E-01 

-0.5524E-04 

1 

-0.1494E-01 

-0.1525E-01 

0.3142E-03 

1 

-0.1612E-01 

-0.1623E-01 

O.llOlE-03 

1 

-0.1741E-01 

-0.1735E-01 

-0.5937E-04 

1 

-0.1726E-01 

-0.1721E-01 

-0.4983E-04 

1 

0.6733E-03 

0.2910E-03 

0.3823E-03 

1 

-0.1272E-02 

-0.1278E-02 

0.6169E-05 

1 

-0.4924E-02 

-0.4856E-02 

-0.6874E-04 

1 

-0.6306E-02 

-0.6314E-02 

0.8279E-05 

1 

-0.9831E-02 

-0.9574E-02 

-0.2563E-03 

1 

-O.llllE-01 

-0.1078E-01 

-0.3309E-03 

1 

-0.1316E-01 

-0.1297E-01 

-0.1877E-03 

1 

-0.1406E-01 

-0.1405E-01 

-0.1310E-04 

1 

-0.1601E-01 

-0.1611E-01 

0.9975E-04 

1 

-0.1678E-01 

-0.1678E-01 

-0.1146E-05 

1 

-0.1721E-01 

-0.1714E-01 

-0.7141E-04 

1 

0.2579E-02 

0.1851E-02 

0.7282E-03 

1 

-0.1295E-02 

-0.1287E-02 

-0.8701E-05 

1 

-0.3260E-02 

-0.3002E-02 

-0.2582E-03 

1 

-0.6250E-02 

-0.6342E-02 

0.9177E-04 

1 

-0.8164E-02 

-0.8125E-02 

-0.3906E-04 

1 

-0.1105E-01 

-0.1082E-01 

-0.2297E-03 

1 

-0.1205E-01 

-0.1198E-01 

-0.7428E-04 

1 

-0.1400E-01 

-0.1406E-01 

0.6104E-04 

1 

-0.1487E-01 

-0.1512E-01 

0.2482E-03 

1 

-0.1679E-01 

-0.1675E-01 

-0,4665E-04 

1 

-0.1721E-01 

-0.1710E-01 

-0.1147E-03 

1 

0.2526E-02 

0.1836E-02 

0.6905E-03 

1 

0.6611E-03 

0.2722E-03 

0.3889E-03 

1 

-0.3156E-02 

-0.3043E-02 

-0.1126E-03 

1 

-0.4917E-02 

-0.4940E-02 

0.2303E-04 

1 

-0.8160E-02 

-0.8220E-02 

0.5995E-04 

1 

-0.9743E-02 

-0.9713E-02 

-0,3018E-04 

1 

-0.1205E-01 

-0.1208E-01 

0.3676E-04 

1 

-0.1307E-01 

-0.1310E-01 

0.2860E-04 

1 

-0.1482E-01 

-0.1519E-01 

0.3719E-03 

1 

-0.1594E-01 

-0.1615E-01 

0.2065E-03 

1 

-0.1720E-01 

-0.1710E-01 

-0.9839E-04 

1 

-0.1711E-01 

-0.1714E-01 

0.2684E-04 

1 

0.5598E-03 

0.3568E-03 

0.2031E-03 

1 

-0.1313E-02 

-0.1199E-02 

-0.1141E-03 

1 

-0.4561E-02 

-0.4660E-02 

0.9950E-04 

1 

-0.5753E-02 

-0.5913E-02 

0.1608E-03 

1 

-0.9051E-02 

-0.9135E-02 

0.8416E-04 

1 

-0.1022E-01 

-0.1014E-01 

-0.7277E-04 

1 

-0.1197E-01 

-0.1196E-01 

-0,8461E-05 

1 

-0.1273E-01 

-0.1267E-01 

-0.5241E-04 

1 

-0.1443E-01 

-0.1409E-01 

-0.3391E-03 

1 

-0.1503E-01 

-0.1475E-01 

-0.2820E-03 

1 

-0.1505E-01 

-0.1485E-01 

-0.2073E-03 
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126 

1 

0,2346E- 

127 

1 

-0.1313E- 

128 

1 

-0.3114E- 

129 

1 

-0.5853E- 

130 

1 

-0.7579E- 

131 

1 

-0.1019E- 

132 

1 

-0.1105E- 

133 

1 

-0.1275E- 

134 

1 

-0.1346E- 

135 

1 

-0.1500E- 

136 

1 

-0.1533E- 

137 

1 

0.2332E- 

138 

1 

0.5386E- 

139 
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-0.1564E-01 

0.1378E-06 

1 

-0.1532E-01 

-0,1552E-01 

0.2044E-03 

1 

-0.1442E-02 

-0,1548E-02 

0.1067E-03 

1 

-0.3239E-02 

-0.3317E-02 

0.7767E-04 

1 

-0.6271E-02 

-0.6495E-02 

0.2233E-03 

1 

-0.7444E-02 

-0.7616E-02 

0,1714E-03 

1 

-0.1035E-01 

-0.1046E-01 

0.1144E-03 

1 

-0.1137E-01 

-0.1151E-01 

0.1408E-03 

1 

-0.1278E-01 

-0.1297E-01 

0.1897E-03 

1 

-0.1342E-01 

-0.1356E-01 

0.1477E-03 

1 

-0.1490E-01 

-0.1484E-01 

-0.5431E-04 

1 

-0.1538E-01 

-0.1544E-01 

0.5332E-04 

1 

-0.1534E-01 

-0.1542E-01 

0.7625E-04 

1 

0.1402E-03 

0.1743E-03 

-0.3406E-04 

1 

-0.3367E-02 

-0.3362E-02 

-0.4798E-05 
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366 

1 

-0.4870E-02 

-0.5203E-02 

0.3329E-03 

367 

1 

-0.7444E-02 

-0.7655E-02 

0.2108E-03 

368 

1 

-0.9024E-02 

-0.9137E-02 

0.1137E-03 

369 

1 

-0.1137E-01 

-0.1156E-01 

0.1853E-03 

370 

1 

-0.1203E-01 

-0.1226E-01 

0.2285E-03 

371 

1 

-0.1344E-01 

-0.1362E-01 

0.1755E-03 

372 

1 

-0.1403E-01 

-0.1424E-01 

0.2143E-03 

373 

1 

-0.1541E-01 

-0.1550E-01 

0.8990E-04 

374 

1 

-0.1567E-01 

-0.1562E-01 

-0.5090E-04 

375 

1 

0.1656E-03 

0.1723E-03 

-0.6694E-05 

376 

1 

-0.1515E-02 

-0.1587E-02 

0.7160E-04 

377 

1 

-0.4870E-02 

-0.5215E-02 

0.3451E-03 

378 

1 

-0.6322E-02 

-0.6539E-02 

0.2165E-03 

379 

1 

-0.8948E-02 

-0.9163E-02 

0.2151E-03 

380 

1 

-0.1032E-01 

-0.1055E-01 

0.2268E-03 

381 

1 

-0.1206E-01 

-0.1232E-01 

0.2619E-03 

382 

1 

-0.1283E-01 

-0.1308E-01 

0.2531E-03 

383 

1 

-0.1403E-01 

-0.1431E-01 

0.2854E-03 

384 

1 

-0.1490E-01 

-0.1497E-01 

0.7639E-04 

385 

1 

-0.1564E-01 

-0.1571E-01 

0.6475E-04 

386 

1 

-0.1537E-01 

-0.1564E-01 

0.2691E-03 

387 

1 

-0.1531E-02 

-0.1609E-02 

0.7815E-04 

388 

1 

-0.3367E-02 

-0.3397E-02 

0.3005E-04 

389 

1 

-0.6373E-02 

-0.6587E-02 

0.2146E-03 

390 

1 

-0.7444E-02 

-0.7718E-02 

0.2736E-03 

391 

1 

-0.1035E-01 

-0.1061E-01 

0.2599E-03 

392 

1 

-0.1132E-01 

-0.1167E-01 

0.3484E-03 

393 

1 

-0.1283E-01 

-0.1316E-01 

0.3296E-03 

394 

1 

-0.1339E-01 

-0.1376E-01 

0.3719E-03 

395 

1 

-0.1487E-01 

-0.1505E-01 

0.1775E-03 

396 

1 

-0.1538E-01 

-0.1565E-01 

0.2622E-03 

397 

1 

-0.1537E-01 

-0.1571E-01 

0.3394E-03 

Kolmogorov-  Smirnov 

distance 

statistic  = 

0.9152800E-01 

Kolmogorov-  Smirnov 

significance  value  = 

0.2583816E-02 

Iter  1500;  ierr  1 : iteration  limit 

Used  1500  iterations;  3011  function  calls;  Err  0.000;  |g|/|w|  1.018E-04 

Rms  change  in  weights  0.070 

Weights  written  to  file  xdspdntmwt . out 
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